The Theropod Database Blog: How to add your taxon to the Lori analysis

Saturday, July 13, 2019

How to add your taxon to the Lori analysis

When designing the Lori analysis, I didn't want it to just be a single use test. With nearly every Mesozoic maniraptoromorph included, quantified characters and a character list that's been modified to avoid any correlated or composite examples, it's the best published analysis to include your new taxon in as long as its not stemward of Ornitholestes or a member of crown Aves. The corollary is that it's so detailed that the usual quick TNT run will not find the Most Parsimonious Trees. Don't let that deter you though, as I'm writing this blog post to walk you through the steps of adding a new taxon and finding its most parsimonious position.

The first step is to score your new taxon. You might notice I've included two NEXUS files at PeerJ. This one is for scoring taxa in Nexus Data Editor (NDE). It includes character and state descriptions to make this easy. If your specimen is immature, I've added the option to score it 'N' for characters that are known to vary with ontogeny. The character list indicates which characters qualify for this, but they're easy to notice in NDE too because they have a series of undefined states through state 9 before it lists state N (Figure 1). Another advantage of NDE is that you can distinguish uncertainty polymorphies from variation polymorphies. Uncertainty polymorphies, such as 'it either has six or seven sacrals, but I can't tell which' are indicated with a slash, as in '1/2'. Variation polymorphies, such as 'some individuals have six sacrals and others have seven' are indicated with a plus sign, such as '1+2'. If a feature is inapplicable, such as a tooth character for a toothless taxon, score it with a dash. N, /, + and - are thus great ways to keep track of how much we know about your taxon. Contrast this with Cau's MegaMatrix, which is entirely 0s, 1s and ?s. It contains basically the same information TNT will use, but isn't as obvious or transparent.

Figure 1. Example of a Lori matrix in NDE.

These symbols would all work fine in PAUP, but that program is far too slow for the Lori analysis. So instead I used TNT (Goloboff and Catalano, 2016). The problem is that TNT has a different set of symbols it recognizes. When NDE makes a NEXUS file from your matrix, uncertainty polymorphies are displayed as curly brackets, such as '{12}'. Variation polymorphies are displayed as normal parentheses, such as '(12)'. TNT doesn't recognize the difference and just uses curly brackets for all polymorphies. Similarly, TNT doesn't recognize inapplicable states and doesn't allow another symbol like 'N' to count as an unknown state. So you'll have to copy your list of scores into a word processor and 'replace all' normal parentheses with curly brackets, capital Ns with question marks and dashes with question marks. Now you have your entry ready for TNT.

Hesperornithoides ???3111??? ?????????? ?????????? ?10???01?? 0110?????? ?????????1 ?????????? ?0?{01}0?10?? ???????0(01)1 0100???1?? 1?01??000? ??1{12}1????? ?????11??1 (01)??????0?? ?????2{01}000 0??10000{01}0 00001000?? ???1?1???? ?????????? ?????????1 ?000110000 1?0000??10 0{12}0?01???? ??{01}??????? {01}1?????{12}?? ???{12}00???? ?????1???0 ???1????10 ???0?0???0 ??????1?0? 00?11???0? ??2?11???1 100001?1?? ???1??1011 00{12}?00?0?? 00???2???0 ?????1010? ?00????10{12} ??11{01}11?0? 1{23}?01?0?01 ??10???1?0 0??????1?? ?00021???? ?0??000?1? ????00???? 00?102001? ???10000?? ?????????? ??0000000? 110??????0 ?0???00??? ??0????11? ?????00?10 ?000?0?0?? ???0?????? ?????????? ??0?0????? ??0?0???10 ?????0???? ???000???? 0???01???1 -1?1000-00 ?0?1?????? ?0??00001? ?0???????? ????000{01}10 0200000?0- -00?1-?0?? ?????0??-1 01????????

becomes...

Hesperornithoides ???3111????????????????????????10???01??0110???????????????1???????????0?{01}0?10?????????0{01}10100???1??1?01??000???1{12}1??????????11??1{01}??????0???????2{01}
0000??10000{01}000001000?????1?1???????????????????????1?0001100001?0000??100{12}0?01???
???{01}???????{01}1?????{12}?????{12}00?????????1???0???1????10???0?0???0??????1?0?00?11???0???2?11???1100001?1?????1??101
100{12}?00?0??00???2?
??0?????1010??00????10{12}??11{01}11?0?1{23}?01?0?01??10???1?00??????1???00021?????0??00
0?1?????00????00?102001????10000??????????????0000000?110??????0?0???00?????0????11??????
00?10?000?0?0?????0??????????????????0?0???????0?0???10?????0???????000????0???01???1?1?10
00?00?0?1???????0??00001??0????????????000{01}100200000?0??00?1??0???????0???101????????

Now take the other NEXUS file, the one designed to run in TNT. Change the number of taxa to add one for your new taxon under the 'ntax=' commend (Figure 2), add your new taxon with its scores to the bottom of the matrix block (Figure 3), and now the important step. I included one saved Most Parsimonious Tree in this NEXUS file, at the bottom after 'begin trees ; tree tnt_1 = [&U]'. If you added your taxon to the base Lori TNT file with 501 taxa, yours is number 502. So where it says '(1,(18,((2,3),(36,('... add your taxon as '(502,(1,(18,((2,3),(36,('..., being sure to include the comma and then add another parentheses to the end of the tree description before the semicolon where it says ',(59,60)))))))))));' .

Figure 2. Where to increase taxon number.

Figure 3. Where to insert your new taxon and scores.

Now save the NEXUS file and open it in TNT. For our example, I've added newly described scansoriopterygid Ambopteryx as taxon 502. In TNT, select 'Trees' > 'View' and you'll see your taxon at the base of the tree and at the bottom center is the tree length as 'Len.' (Figure 4). Here it's 12175, significantly higher than the shortest trees I found at 12123, because Ambopteryx would need a LOT of steps to place so basally. Select 'Settings' > 'Lock trees' to unlock the cladogram, and now you can click just to the left of your new taxon's name. When you right click a node or just to the left of another taxon's name, your new taxon will move there. If we move Ambopteryx to the base of Scansoriopterygidae, tree length drops to 12147. You wouldn't expect it to get back down to 12123 unless your new taxon adds no new information. Conversely, any information it adds has the power to change the topology of closely related taxa.

Figure 4. Your new taxon added and where to see tree length.

Now you let TNT use its power to find the best topology. After increasing the 'Max. trees' under 'Memory' in 'Settings' to 10000, run a 'New Technology search' getting trees from 'RAM' using 'Sect. Search' (with 'CSS' unchecked), 'Ratchet', 'Drift' and 'Tree fusing'. With Ambopteryx, this quickly finds 13 trees of length 12142. One thing I've noticed is that a low amount of trees, like 13, indicates there's more work to do. So reset 'Max. trees' to 100 and run a 'Traditional search' of 'trees from RAM'. This gets you 100 trees of that length to work with. Now reset it to 10000 Max. trees and run the New Technology search from RAM again. The new result is 100 trees of length 12142, which from my experience usually means those are the shortest trees you'll find. You can keep switching New Tech and Trad searches like this until you're satisfied, but end it with a Trad search after increasing the Max. trees to 99999 to fully sample tree space. In the present example, the topology within Scansoriopterygidae changed, it moved to the base of Paraves (1 step longer in the original matrix) and Pedopenna moved to Archaeopterygidae (1 step longer in the original matrix) (Figure 5).

Figure 5. Taxon successfully added.

And that's how you add a new taxon to the Lori matrix. Later in Lori Week, I'll show you the new and better way to run a constraint analysis in a huge matrix like this, and also how to track down where taxa with multiple equally parsimonious positions can go. Also, some diagrams for exactly what to measure for some of the potentially ambiguous quantified characters.

References- Goloboff and Catalano, 2016. TNT version 1.5, including a full implementation of
phylogenetic morphometrics. Cladistics. 32(3), 221-238. DOI: 10.1111/cla.12160

Hartman, Mortimer, Wahl, Lomax, Lippincott and Lovelace, 2019. A new paravian dinosaur from the Late Jurassic of North America supports a late acquisition of avian flight. PeerJ. 7:e7247. DOI: 10.7717/peerj.7247

24 comments:

Andrea CauJuly 13, 2019 at 10:14 AM
That is not the correct way to re-run an analysis with a new OTU.
The correct procedure is: score the taxon and run the new analysis following the same original protocol of the first analysis.
You must re-run your analysis from zero, otherwise, you artificially force the analysis to explore the tree islands starting from an arbitrarily assembled tree which was based on a set of hypotheses (the first matrix) which is different from the actual set of hypotheses (the matrix including Ambopteryx). You have enforce TNT to explore the tree space from an arbitrary topology defined on a now obsolete set of hypotheses.
What if the inclusion of the new OTU has modified the tree space so radically that the previous optimum is now just a local optimum? Could the brief TNT run you performed in the second run be able to find the novel optimal island? I doubt this.
So, the only valid way to run an analysis after a new OTU is scored is to merely re-run the whole analysis following the same original tree search strategy used for the first matrix.
ReplyDelete
Replies
David MarjanovićJuly 13, 2019 at 3:45 PM
Fortunately, the distinction between polymorphism (weird that you call it "variation polymorphies", I've never seen that before) and partial uncertainty ("uncertainty polymorphies") does not have any effect on tree search. It can drastically change tree length though. If there's any polymorphism in your matrix, the lengths given by TNT and by Mesquite for the same tree will never match up; those given by Mesquite will always be longer, because Mesquite always distinguishes polymorphism from partial uncertainty (the former adds steps within terminal branches, the latter does not), while TNT always treats the former as the latter. PAUP* can be set to all three options.

I use Mesquite instead of NDE because Mesquite can edit matrices, display trees, let you mess with topologies and branch lengths, reconstruct the evolution of characters on trees, and much else.

These symbols would all work fine in PAUP[*]

Sort of. While lots of people seem to believe that "-" means "inapplicable", as far as PAUP* is concerned it means "gap in a molecular sequence". By default, PAUP* is set to treat gaps as missing data, but make sure you haven't set it to treat gaps as a 5th base/21st amino acid, because that's the other available setting.
ReplyDelete
Replies
NickJuly 15, 2019 at 5:59 PM
Where's the next post? :)
ReplyDelete
Replies
AnonymousMay 28, 2021 at 9:06 PM
Just curious, if you add Tamarro to the Lori analysis yourself, will you get the same result as Selles et al?
ReplyDelete
Replies
AnonymousSeptember 26, 2022 at 4:09 AM
Do you have a copy of the matrix in Figure 1 (with all the specimens of Archaeopteryx scored separately)?
ReplyDelete
Replies

Add comment