Tuesday, March 28, 2017

Ornithoscelida Tested- Adding taxa and checking characters

So the last post reported on Baron et al.'s (2017) new paper that recovered a theropod-ornithischian clade excluding sauropodomorphs, which they inaccurately resurrected the name Ornithoscelida for.  Now that I've been able to construct a TNT file from their matrix and have coded a few relevant taxa for it, we can explore it in further depth.

First, recall I questioned the authors' method of constraining Saurischia- "They tested this in an IMO un-ideal way, by constraining all 42 saurischians to be in an exclusive clade.  Would have been better to just use a backbone of e.g. (Euparkeria, Lesothosaurus (Plateosaurus, Coelophysis), in case e.g. herrerasaurids aren't dinosaurs."  So I did the latter (replacing Euparkeria with Postosuchus due to how TNT works) and found Saurischia is actually only 15 steps longer, not 20.  Herrerasaurids are still sauropodomorphs, and Saltopus and silesaurids are still in a trichotomy with Dinosauria.  I also wondered how many steps were needed to move Eoraptor to Sauropodomorpha.  The answer is 22(!).  Hmm.  And ornithischian silesaurids?  28 more steps, which results in Saurischia.

But one aspect I didn't initially notice is that Daemonosaurus is missing from Baron et al.'s analysis.  This genus has a skull which is rather odd for a supposed basal theropod- short-snouted with pronounced heterodonty.  It also didn't include weird theropod-ornithischian cross Chilesaurus or theropod-like supposed basal sauropodomorph Buriolestes.  So I added these three taxa to Baron et al.'s matrix.

Phylogeny of the Baron et al. 2017 dataset after adding Buriolestes, Daemonosaurus and Chilesaurus (highlighted).

What happens?  Ornithoscelida is still a thing, but herrerasaurids move to outside Dinosauria.  Buriolestes is the most basal sauropodomorph and  Daemonosaurus is a theropod sister to Tawa, which are normal placements for these taxa.  Chilesaurus is an ornithischian sister to Pisanosaurus at the base of the clade, which is interesting.  Both are South American...  Though note Chilesaurus' proposed avepod relatives (basal tetanurines) aren't in the matrix.

This makes Saurischia only ten steps longer than Ornithoscelida, with Buriolestes as the basalmost theropod and herrerasaurids still basal sauropodomorphs in that case.  A sauropodomorph Eoraptor is now 16 steps longer.  21 more steps are needed for ornithischian silesaurids, with Buriolestes again a theropod and Chilesaurus remaining sister to Pisanosaurus

Unfortunately, coding these taxa exposed a ton of problems with Baron et al.'s matrix.  First it doesn't really have 458 characters.  Weirdly enough, 6 characters are coded the same for every taxon and another 47(!) are only coded as different in one taxon.  Which makes them useless for resolving relationships, or "parsimony uninformative" in the technical jargon.  Taking into account the fake character zero that's only there because of how TNT works, the matrix actually only has 404 characters that are doing anything.  Still big, but not so much bigger than Nesbitt et al.'s 315.

Also, some of the characters are badly formed at a Petersian level. 
Check out 167- "Dentition: 0, homodont; 1, slightly heterodont, with small observable changes across tooth rows; 2, markedly heterodont, clearly distinct types of teeth present (modified from Parrish, 1993; Nesbitt, 2011). ORDERED."  This ignores the fact that heterodonty can exist in numerous ways that aren't necessarily homologous at all.  The anterior peg teeth, big canines and crushing cheek teeth of heterodontosaurids (coded 2) aren't just a further development of Eoraptor's (coded 1) anterior leaf shaped teeth which grade into posterior generalized carnivorous teeth, for instance.  The direction of gradation to herbivorous teeth is even different there.
Or how about character 139- "Foramen located on the dorsal (and sometimes lateral) face of the surangular (surangular foramen): 0, present; 1, absent. NEW."  This codes for whether an anterior or a posterior surangular foramen (or both) are present, with the next character coding for which one (or both) exist- "Surangular foramen: 0, both foramen (anterior, dorsally positioned and posterior, laterally positioned) remain open; 1 only the foramen on the dorsal surface of the surangular, anterior to or at the point of maximum mandibular depth remains open; 2, only the foramen located laterally, posterior to the point of maximum mandibular depth remains open. NEW."  But the very fact both foramina can exist in the same element shows they are not homologous with each other, so a taxon with only a posterior foramen shouldn't be counted as having the same state in character 139 as a taxon with only an anterior foramen (e.g. Coelophysis and Eocursor in the matrix), but they are.
Or character 274- "Metacarpals I and V: 0, both substantially shorter in length than metacarpal III; 1, only metacarpal I longer than or subequal to metacarpal III; 2, only metacarpal V longer than or subequal to metacarpal III; 3, both are longer than or subequal to metacarpal III (modified from Butler et al., 2008)."  This just provides all the permutations of mcI and mcV length relative to mcIII.  But the fact you need all four permutations shows mcI and mcV length don't covary, so shouldn't be covered by the same character in the first place.  This kind of coding also hides homology, since e.g. the metacarpal V reduction in states 0 and 1 won't be counted as more similar to each other by TNT.  If you're wondering which dinosauromorphs have only mcV longer or subequal to mcIII, the matrix says it's Dilophosaurus.  Which is untrue, as Dilophosaurus' mcV is a tiny nub.  Character 434 does the same thing, but for shaft widths of mtI and mtV.

Metacarpus of Dilophosaurus wetherilli (UCMP 37303?) showing metacarpal V (outlined and arrowed).  I wouldn't assign that the state "only metacarpal V longer than or subequal to metacarpal III"... (after Xu et al., 2009).

There are also a lot of correlated characters- 2 and 3; 7, 9 and 10; 21 and 24; 139 and 140; 147 and 149; 167 vs. 168, 171 and 180; 215, 221, 222 and 225; 244 and 245 (but nothing is coded 0 for 245 though Postosuchus should be...); 252 and 253; 260 and 261; 279 and 278; 292, 306 and 308; 293, 294, 296, 307 and 326; 295 and 297; 314 and 316; 328 vs. 323 and 324; 344 and 345; 378 and 380; 411 and 412; 435, 436 and 446.  These seem to happen when the authors took characters from different sources but didn't realize they cover the same ground.  So for example, 215 codes for sacral number (from Butler et al.), 221 codes for a vertebra inserted between ancestral sacrals 1 and 2 (from Nesbitt et al.), 222 codes for the number of dorsosacrals (from Gauthier) and 225 for the number of caudosacrals (supposedly new).  And now sacral number is weighted more than it should be in the matrix.

Similarly, while Baron et al. did order a number of characters, many more should have been- 10, 11, 52, 58, 80, 92, 107, 129, 151, 154, 155, 179, 194, 306, 320, 324, 329, 336, 341, 354, 358 and 403.  They also ordered 333 in the wrong way- "Shaft of pubis (postpubis), shape in cross-section: 0, blade-shaped; 1, rod-like; 2, rod-like, but with a tapering medial margin (tear-drop shaped) (modified from Butler et al., 2008)."  This should have states 1 and 2 flipped, since tear-drop shaped is the intermediate shape.

The authors didn't take ontogeny into account when coding fusion characters (351, 422, 438 and 445), so that taxa known only from young individuals (e.g. Tawa, Pantydraco, Panguraptor, Liliensternus) are coded as if they are adults.

A final minor note that doesn't affect the analysis itself is there are way too many characters credited as "NEW" which are anything but.  I don't expect every other matrix to be scoured, but there are some basic characters credited as new here- nasolacrimal crests (48); deep basisphenoid recess (100); posterior exposure of basiphenoid recess/plate (108); dorsal expansion of the anterior dentary (125); fan-shaped dorsal neural spines (211); fused sacral neural spines (219); presence of a caudosacral vertebra (225); metacarpal V absence (278); highly reduced ischial peduncle of ilium (316); obturator and pubic foramen presence (338); distal notch between pubes (348); pelvic fusion (351), etc., etc..

Overall, even if it was coded correctly, I don't think I'd trust this analysis within the 15 steps needed to dump Ornithoscelida.  So consider my earlier support withdrawn.  So disappointing.  And that's not even getting to the coding accuracy, which is coming next time...

References- Xu, Clark, Mo, Choiniere, Forster, Erickson, Hone, Sullivan, Eberth, Nesbitt, Zhao, Hernandez, Jia, Han and Guo, 2009. A Jurassic ceratosaur from China helps clarify avian digital homologies. Nature. 459, 940-944.

Baron, Norman and Barrett, 2017. A new hypothesis of dinosaur relationships and early dinosaur evolution. Nature. 543, 5601-506.

90 comments:

  1. Chilesaurus as sister taxon to Pisanosaurus... I know the result is only provisional (due to lack of basal tetanurines), but out of interest, what characters unite Chilesaurus + Pisanosaurus?
    If (big if) this relationship holds, the name Pisanosauridae is available.)

    ReplyDelete
    Replies
    1. 175. Maxillary teeth, posterior cutting edge of posterior maxillary teeth: 1, convex (modified from Sues et al., 2003; Clark et al., 2004; Nesbitt, 2011).
      178. Extensive planar wear facets across multiple maxillary/dentary teeth: 1, present (Weishampel and Witmer, 1990; Nesbitt, 2011; Han et al., 2012).
      373. Level of most proximal point of anterior trochanter (lesser trochanter) relative to level of proximal femoral head: 1, anterior trochanter positioned proximally, approaches level of proximal surface of femoral head (modified from Butler et al., 2008).
      415. Anterior ascending flange of the astragalus: 0, less than or equal to the height of the dorsoventral extent of the posterior side of the astragalus; (modified from Gauthier, 1986; Novas, 1992, 1996; Benton, 1999; Rauhut, 2003; Nesbitt, 2011).

      Character 373 is particularly troubling since Pisanosaurus does not preserve the proximal femur. Yet it seems it was coded for 18 proximal femur characters. If that's not worrying, I don't know what is.

      Delete
    2. Is the description particularly craniocentric? I recently visited a collection to see all the material known of two temnospondyls where the skulls have been described and illustrated but the fragments of the lower jaw and postcranium have barely been mentioned.

      Delete
    3. No, Casamiquela (1967) was very thorough with the postcrania, even noting "a pair of tiny fragments that could have corresponded to the shaft of a tibia and fibula" and "A fragment of sharp bone of roughly ellipsoidal or flattened cross-section" which he lists as Problematicum. He says "The mesiodistal portions of both femora are preserved..." Bonaparte (1276) says Casamiquela described all the remains except for three metacarpals and the sacrum-pelvis section. Most recently, Irmis et al. (2007) state "The two partial distal femora are too incomplete to provide any phylogenetic information."

      Delete
    4. Maybe the referral of some parts is unclear? After all, if Casamiquela described indeterminate fragments, why would he completely omit the bones Bonaparte mentioned?

      The alternative, of course, is that Baron et al. either scored another taxon and slipped the line, or perhaps they were scoring a fictitious reconstruction drawing (if one exists).

      Delete
    5. Good question. Hey Baron! What was your reasoning? Better to figure this out now than bring it up in our paper with a possibly invalid excuse. There are no email addresses in their paper as far as I can tell...

      Delete
    6. I submitted a comment posing this question on his first blog post. It's in moderation.

      Delete
    7. As is my positive comment submitted a few days back. So we will see if he ever approves comments sitting in moderation. :-(

      Delete
    8. To answer David's question a bit better, the areas Casamiquela missed were all impressions, not preserved elements. So maybe Casamiquela didn't notice the impressions.

      Delete
    9. Oh. There's nothing wrong with scoring impressions. I've published on a specimen that is nothing but a natural mold (me & Witzmann 2015)... and such famous fossils as Triadobatrachus are natural molds as well! I'll submit another comment.

      Delete
    10. Ditto Scleromochlus (weirdly absent from recent avemetatarsalian analyses), Saltopus, etc..

      Delete
  2. BTW, the tree is so small it's pretty much impossible to read.

    ReplyDelete
    Replies
    1. Forgot to mention... nice work. It would have been even nicer if this kind of forensic dissection of the phylo analysis had occurred during Nature's peer review process.

      Delete
    2. Thanks! For the tree- right click > View Image and magnify.

      Delete
    3. Works - thanks.
      Tree had Saltopus as basal herrerasaurid...?

      Delete
    4. Well, that was just the initial tree from after the New Technology search. As was recently revealed on Twitter, Baron et al. didn't run their initial trees through a Traditional search to explore all the treespace. When I do that, Saltopus can go in several positions, as the authors report. One is as a herrerasaur, but it can also be sister to Herrerasauridae+Dinosauria (and maybe sister to Dinosauria too, I don't have the patience to check all 3564 mpts).

      Delete
    5. It would have been even nicer if this kind of forensic dissection of the phylo analysis had occurred during Nature's peer review process.

      There are only two reviewers on this planet who read supplementary information. One is Heinrich Mallison. I'm the other one, and I don't even work on dinosaurs.

      Delete
    6. David, add me in the list.
      I re-run all phylogenetic matrices I have to review, often performing additional tests that then include in my revision for helping the authors to produce a more robust analysis.

      Delete
    7. Awesome! I did that on the one matrix I've been given to review because the authors hadn't reported the strict consensus at all, just the majority-rule consensus. Do you complain at length about character lists? :-)

      Delete
    8. I usually am asked to review descriptions of new taxa, i.e., manuscripts that include a phylogenetic analysis among the tools used, but not devoted exclusively to a phylogenetic matrix.
      My main interest as reviewer in those manuscript is to check that the results they provide is replicable. Often, the attached matrices are badly formatted, or do not produce the trees the authors discuss in the text.
      As reviewer, my role is not to propose changes or complain to the data they want to use (this is something that the whole scientific community must do), but to certificate that the result they claim to produce is actually produced by the data they provided.

      Delete
    9. I mean redundant characters and characters whose states aren't defined clearly enough for people other than the authors to score with confidence.

      Delete
  3. Wondering what topological effects throwing in Isaberrysaura would have on the placement of Chilesaurus in this dataset, Mickey?

    Also, the material of Pisanosaurus is so hard to score (some character scores being difficult to determine if you check Irmis et al. 2006), I wonder what effect on the placement of Chilesaurus is if Pisanosaurus is removed.

    ReplyDelete
    Replies
    1. The Pisanosaurus question is easy enough to investigate. Deleting it destabilizes the tree, so that Chilesaurus is sometimes the basalmost ornithischian and sometimes sister to Ornithoscelida. Buriolestes is then sometimes sister to that group, and Eucoelophysis and Diodorus sometimes join Chilesaurus.

      Dunno about Isaberrysaura, but the more I look, the less I'm inclined to think that this dataset is useful enough to matter. :|

      Delete
  4. I started trying to code Chilesaurus into the matrix myself (haven't finished yet, and I suppose the point is moot) and I have to say I'm pretty disappointed with the characters too. I'm kind of hoping the matrix still supports Ornithoscelida with the mistakes corrected, though, just because it'd be incredibly disappointing if there was this much hype over nothing.

    Since I'm a sauropod guy, the first thing that stuck out to me with the codings was those for the sauropods, even though they probably don't affect the Sauropodomorpha-Theropoda-Ornithischia split much. Tazoudasaurus is way better known than that, for one.

    ReplyDelete
  5. Nice test, Mickey. Although it does not dismiss Ornithoscelida, it confirms that taxon sample matters... and adding other taxa may further weaken it.
    I noted too the series of badly defined, correlated or uninformative characters: I suspect the last are present because the dataset used in the Nature paper is based on a larger matrix having more taxa (I noted several states not scored in the matrix but listed in the character list that are tetanurine features).

    The "heterodonty" and bizarre composite metacarpal characters are quite disturbing... several assumed homologies with no sense.

    A basal dinosauriform position for herrerasaurs also results in some some iterarions of my dataset.
    For Chilesaurus, once you score averostrans I am quite sure it will return home (aka Tetanurae)...

    ReplyDelete
    Replies
    1. Ruta & Coates (2007) included several characters that they said were uninformative (which is correct in most cases) and said that they were meant for future expansions of their dataset (which haven't been made). A few other uninformative characters were included without comment.

      Delete
  6. PUUUUUUUUUUUUUUUBLIIIIIIIIIIIIIIIIIIIIIISH!!!!!!!!1!!!!1!

    Tell me how I can help, and I'll do it.

    Listen: this is more urgent than our manuscript on artificial missing data. Considering its importance both inside the field and for the media, it's even more urgent than Lori.

    These seem to happen when the authors took characters from different sources but didn't realize they cover the same ground. So for example, 215 codes for sacral number (from Butler et al.), 221 codes for a vertebra inserted between ancestral sacrals 1 and 2 (from Nesbitt et al.), 222 codes for the number of dorsosacrals (from Gauthier) and 225 for the number of caudosacrals (supposedly new). And now sacral number is weighted more than it should be in the matrix.

    Ah yeah. That's familiar; Ruta & Coates (2007) seem to have done it several times in their matrix of early tetrapods (me & Laurin forthcoming, preprint 2016).

    Some of their sources were evidently autapomorphy lists, so that only one state was even described, and the other is "everything else" (nonadditive binary coding). Did Baron et al. do this, too?

    ReplyDelete
    Replies
    1. I should mention that I'll have time very soon.

      Delete
    2. You should absolutely do this. Do the Peerj route so its open access and you can get input at the preview stage from more than just the couple or three reviewers. This is important for many reasons.
      TA Dececchi

      Delete
    3. Please do publish this - I think it's of vital importance.

      Delete
    4. I agree--this should see publication.

      Delete
    5. To your question re: autapomorphy lists, each proposed Lesothosaurus autapomorphy from Baron et al.'s (2016) redescription of its postcrania is included (characters 13, 38, 142 and 441). The latter two are only coded as present in that taxon. Luckily, I don't recall any characters with an "everything else" state, though several have states which don't cover the logical range of possibilities.

      And sure, you, Andrea and I can publish this in PeerJ. I'm registered with them.

      Delete
  7. All this Ornithoscelida made me think of Eshanosaurus again... specially being Early Jurassic and all that.

    Did anyone compare Eshanosaurus to Chilesaurus?

    ReplyDelete
    Replies
    1. Could you bring that blog post up again?

      Delete
    2. Not in a blog post, in my data matrix. I have both Chilesaurus and Eshanosaurus, basal ornithischians and basal sauropodomorphs: they never group together.

      Delete
    3. Are you saying Chilesaurus comes up as a basal ornithischian, and Eshanosaurus comes up as a basal sauropodomorph?

      Delete
    4. Honestly, they're not very similar. I personally think Eshanosaurus is a basal sauropod, while I doubt Chilesaurus has sauropodomorph affinities.

      Delete
    5. I have to enter some basal sauropod for Eshanosaurus. Plateosaurus is included but they never form a clade.

      Delete
  8. PeerJ is a great idea. Allows transparency and increased scrutiny during the review process. Since there are so many dud characters (correlated, nonexistent, miscoded, etc) gunking up the analysis, this extra scrutiny would really help.

    ReplyDelete
    Replies
    1. Does it actually happen in the dinosaur world? My preprint, on non-dinosaurs, has been up in two successive versions since mid-December 2015, and there's been only one comment – which concerns the statistical tests for whether certain topologies are really different; nobody seems to have taken a look at the matrix or even just the character list.

      BTW, publication in PeerJ PrePrints doesn't necessitate publication in PeerJ; you could take it elsewhere, too... not that I know why you would... :-)

      Delete
    2. Hey, I tried critiquing multiple PeerJ preprints (e.g. Boyd and Pagnac's terrible Dakotadon analysis, Brownstein's problematic Arundel ornithomimosaur paper), but the former didn't change anything while the latter never awknowledged my feedback in the paper.

      Delete
    3. Yes, that's the problem: the authors can choose to ignore the comments. If you yourself want to publish via PeerJ, it's your choice whether you take comments on board or not.

      Delete
    4. @Ms. Mortimer,

      I just saw this, and I'm sorry I didn't acknowledge your feedback. I'll do so now by accepting it on the preprint! It was very helpful to me and help me consider alternative hypotheses, such as sexual or individual variation leading to the different morphotypes. Excuse my impoliteness there, as I hadn't realized I didn't acknowledge you in the final manuscript. During the week of the publication, they gave me some final checks, and after I had submitted them I realized I had not added an acknowledgement section for my reviewers and editor. Your comments must have slipped my mind when I emailed the PeerJ crew asking them to add the section. Again, I am so sorry for this mistake. I hope you are having a good summer!

      Regards,

      Chase Brownstein
      Research Associate,
      Stamford Museum and Nature Center

      Delete
    5. Whoops, I meant to write "helped" instead of "help". Also, I've written an apology to you and the other fellow who wrote comments on the preprint on the PeerJ website if you don't catch it here. Again, I am so sorry about this.

      Regards,

      Chase

      Delete
    6. Thanks for the update! We all make mistakes, and you did the right thing by fixing this one. All the best.

      Delete
    7. No problem! I'm glad we were able to reconcile!

      Regards,

      Chase

      Delete
  9. If you need a PeerJ co-author free of charge, let me know ;-)

    ReplyDelete
    Replies
    1. Would you be up for adding Elaphrosaurus, Ceratosaurus, Monolophosaurus and Piatnitzkysaurus to the matrix, since you added Allosaurus already? Would solve the coelophysoid-only problem we noted, and give Chilesaurus a chance to group there.

      Delete
  10. I also am available for the low cost of free ;)

    ReplyDelete
  11. Yay! The more coauthors, the better. I can probably start writing a skeleton draft tomorrow!

    ReplyDelete
    Replies
    1. I would say to check your email David, but it appears that the situation has long advanced since I first drafted that.

      Nick ;-)

      Delete
    2. Excellent. I corrected the 29 characters that are shared with the Lori matrix with the 52 taxa the matrices share (spoiler- 18% of the codings were wrong). Now I'm correcting all taxa for the 14 characters that support Ornithoscelida in the resulting trees.

      Delete
    3. Are you altering the character constructions or simply correcting the scorings of characters?

      Delete
    4. Mostly just correcting, but that surangular foramen character 140 was one that supposedly supported Ornithosceloida (the state " only the foramen located laterally, posterior to the point of maximum mandibular depth remains open"). So I changed 139 and 140 to anterior and posterior surangular foramen closed, respectively.

      Delete
    5. For other people who are scoring additional taxa, do you have a list of which characters you have modified?

      For people who haven't looked, the original characters are:

      139. Foramen located on the dorsal (and sometimes lateral) face of the surangular (surangular foramen): 0, present; 1, absent. NEW
      140. Surangular foramen: 0, both foramen (anterior, dorsally positioned and posterior, laterally positioned) remain open; 1 only the foramen on the dorsal surface of the surangular, anterior to or at the point of maximum mandibular depth remains open; 2, only the foramen located laterally, posterior to the point of maximum mandibular depth remains open.

      So, are the new characters now?
      139. Anterior surangular foramen: 0, open; 1, closed.
      140. Posterior surangular foramen: 0, open; 1, closed.

      Did you rescore the other taxa for each of these?

      Delete
    6. Yeah, those are the new characters, and the other taxa are being rescored for them.

      The only other altered character is supposed ornithoscelidan synapomorphy "large, quadratojugal has broad contact with the ventral margin of the descending process of the squamosal as a butt joint", state 1 of character 71, which is ordered. State 2 is "large, quadratojugal has broad contact with the posterior margin of the descending process of the squamosal as an elongate scarf joint." Looking at various taxa though, most have a somewhat diagonal contact between the elements. Since the character's ordered, combining states 1 and 2 won't have an effect on the matrix supporting a progression from 0 to 1 as an ornithosclidan synapomorphy. Thus the character is now just about the extent of contact between the squamosal and qj- absent to point contact (0); broad contact (1). Note this is just the external contact, since the three dimensional relationships are often more complicated but hidden without disarticulation or CT scanning. For instance, Euparkeria would seem to have a narrow point contact, but actually the quadratojugal's dorsal process has a posterior plate-like section overlapped by the quadrate that is sandwiched between that bone and the squamosal's ventral process (Ewer, 1965).

      Delete
  12. All of this sounds great! Andrea, given your recent publication history (I'm thinking of your PeerJ paper on the sampled-ancestor dating of dipnoans), would you consider the possibility of running a Bayesian and/or likelihood analysis on the corrected matrix as well, just to see how robust the topology is to the choice of method? (I can help with that if anyone is interested.)

    ReplyDelete
    Replies
    1. This would be something that would be interesting to see-- not only the corrected matrix but the original matrix too.

      Delete
    2. As far as the original matrix goes, analyzing the matrix with the default parameters in MrBayes yields more or less the same result as the parsimony analysis--i.e., it recovers Ornithoscelida.

      For the very reason of Bayesian analysis, though, I don't find the presence of parsimony-uninformative characters in the matrix particularly troubling. Perhaps Baron et al. should have stated that they included numerous autapomorphies in the matrix, but they don't affect parsimony analyses. Bayesian analyses, on the other hand, benefit from autapomorphies.

      Delete
    3. I don't know much about Bayesian analyses, but surely autapomorphies are only beneficial if they're fairly well sampled? In other words, if taxa which actually have more autapomorphies are coded for more parsimony uninformative characters in the matrix. But if you have 75 taxa and only 47 autapomorphies like Baron et al., that doesn't seem plausible. Is that an accurate assessment?

      Delete
    4. That's a fair point. Obviously the ideal (for a Bayesian analysis) would be to sample every character which varies phylogenetically within the taxonomic sample, and this is nowhere close to that.

      But is a somewhat haphazard sampling of autapomorphies better or worse than none at all? I'm not familiar with any studies which have actually tested this, but since there's probably a rough correlation between the number of autapomorphies in the matrix and the total number of autapomorphies, and the parsimony-informative characters are also susceptible to sampling bias, maybe it's not as big a problem after all.

      If nothing else, once you're considering Bayesian analysis, the problem will become whether there are enough autapomorphies, not that there are any.

      Delete
    5. Ornithopsis: if you have the Baron et al. dataset in Nexus format, could you send it to me (david dot cerny one at gmail dot com)? I'd be interested in seeing what the posterior probabilities are for individual clades, as well as trying out some non-default options.

      Mickey: Bayesian inference actually assumes that you have a random sample of characters: under most conditions, such a random sample would presumably include some parsimony-informative characters, a couple of autapomorphies, and a whole lot of invariant characters. Collecting a sample like that is easy for molecules, but pretty much impossible for morphology. That's why sampling corrections have been introduced to account for the fact that most morphological matrices don't have invariant characters and autapomorphies in them -- this is referred to as "ascertainment bias" in the literature. There has been a paper showing that the models of morphological evolution which correct for such a bias still retain the desirable statistical properties of Bayesian inference (statistical consistency) as long as you have at least 8 tips in your tree.

      My assumption is that the best way to analyze Baron et al.'s data (or any other morphological dataset) would be to discard all parsimony-uninformative characters and introduce the appropriate correction. If that's not an option (I think BEAST doesn't offer the corrected models, unless you are willing and able to implement them manually in your XML file), then having some autapomorphies is almost certainly better than having none, even if they are undersampled.

      Obviously the ideal (for a Bayesian analysis) would be to sample every character which varies phylogenetically within the taxonomic sample, and this is nowhere close to that.

      I don't think that's correct, for the reasons mentioned above.

      Delete
    6. Interesting. I wouldn't have guessed invariant characters mattered. Shows how little I know. In any case, I don't think Baron et al.'s 'autapomorphies' were intended to be such. I think they're mostly mistakes. Looking at them...

      Character 1 is "Skull proportions: 0, preorbital skull length more than 45% of basal skull length; 1, preorbital length less than 45% of basal skull length (modified from Butler et al., 2008)." It's only coded as 1 in Eoraptor, but off the top of my head I checked Tianyulong and found a 43% ratio, while Eoraptor's is 50%.

      Character 32 is "Additional opening(s) in the antorbital fenestra (promaxillary foramen), shape: 0, wide and circular; 1, narrow recess or slit-like. NEW" It's only coded state 1 in Heterodontosaurus, but this is also found in e.g. Dilophosaurus and kayentakatae (wrongly coded 0 and ? respectively by the authors).

      Character 44 is "Lacrimal, shape: 0, dorsoventrally short and block-shaped; 1, dorsoventrally elongate and shaped like and inverted L (Rauhut, 2003; Ezcurra, 2010)" and is only coded state 0 in Massospondylus carinatus. Not only does Massospondylus have an elongate lacrimal, but taxa that obviously don't like Postosuchus are coded as state 1. So that character doesn't match reality at all. I actually quadruple checked it just to make sure I was looking at the right column in NDE.

      And those are the first three 'autapomorphic' characters in the matrix. I didn't cherry-pick them to be bad examples. Another indication of the issues with this matrix...

      Delete
    7. Others preceded me, in the comments on Bayesian analysis.
      It is not a problem to use this matrix for a Bayesian analysis.
      Also, BEAST2 allows to take into account lack of autapomorphies biasing terminal branch estimation.

      I may perform a tip-dating analysis, but need ages for all taxa.
      Also, such analyses are time consuming... and I have other projects that have immediate priority.

      Give me a couple of weeks...

      Delete
  13. From the last I've read (cited in my preprint, where there's a section justifying why we only used parsimony), lack of parsimony-uninformative characters is not a problem for Bayesian analysis (presumably when the corrections mentioned above are used). However, missing data can drastically mislead Bayesian inference – strong support for wrong trees – if they don't have highly unrealistic distributions.

    On top of that, for a matrix this size, not much difference in the performance of parsimony and Bayesian inference is expected.

    Nick, I got the e-mail and will reply in a few hours!

    Mickey, you still use NDE? Why not Mesquite, which also lets you trace characters on trees and do lots more stuff?

    So that character doesn't match reality at all. I actually quadruple checked it just to make sure I was looking at the right column in NDE.

    Could the state numbers of that character simply be inverted between the matrix and the character list? I've seen this happen before (documented in my preprint).

    ReplyDelete
    Replies
    1. I downloaded Mesquite, but NDE is much more intuitive. Maybe I didn't give Mesquite enough of a chance..

      And no, dinosaurs in general have elongate L-shaped lacrimals and are coded that way, so the ol' inverted character state situation can't apply to this either.

      Delete
    2. I use NDE too.
      Mesquite is terrible... at least for a computer moron like me.

      Delete
    3. That's really strange. I mean, earlier versions of Mesquite crashed a lot, but that's a thing of the past... and handling a matrix isn't actually any different than in NDE. (Does NDE let you shift characters or taxa around, though, and does it let you change the order of states without you having to do that manually by changing each cell?)

      Delete
    4. Yes, it does. And is what I need. Mesquite is redundant for what I need to do to a matrix. The only boring stuff with NDE is that new taxa are added at the end of the matrix... but I like them to be ordered alphabetically. This is done manually... and with >400 OTUs it is such a pain...

      Delete
    5. Andrea. You could open your NEXUS file in Notepad, copy and paste the matrix into Word, sort A-Z, then re-paste into the NEXUS file. Repeat for the taxon list if you use that.

      Delete
    6. Thanks for the suggestion, Nick!

      Just a note: my taxon list is arranged into 3 blocks:
      1.well-preserved taxa
      2.fragmentary but worth of being retained.
      3.very fragmentary, possible chimeras, or not yet published.

      So, it is necessary any time to do some manual edit anyway.

      Delete
    7. So, that's easy to solve as well, you can copy the entire nexus or tnt file into Word, the A-Z function will only sort selected text. Just select the blocks you want to sort, then use the A-Z. I believe LibreOffice has this same feature.

      Delete
    8. David M.: I swear I didn't mean for this to turn into a discussion of the relative merits of parsimony and Bayesian inference, but the points you raised deserve to be addressed. I was curious to see what justification for only using parsimony you offered in your preprint, and I have to say I'm not convinced:

      First, our dataset contains 19 unordered multistate characters, 38 ordered multistate characters and 4 characters with more complex stepmatrices even in Analyses R1–R6 where (see above) bone losses are not scored as irreversible. To the best of our knowledge, the available programs in which parametric methods are implemented cannot handle such a combination.

      Both MrBayes and BEAST can handle ordered as well as unordered multistate characters. I guess the "4 characters with more complex stepmatrices" could still pose a problem, but I doubt the tree would change drastically if they were analyzed in a more simplistic fashion.

      Your second argument essentially says that the missing data distributions simulated by Simmons are more likely to occur in real-life morphological matrices than those simulated by Wright and Hillis, and that this is bad news for parametric methods. That's problematic on multiple levels: (1) Since Simmons designed his simulations to mimic molecular supermatrices made up of multiple partitions, it's not obvious why the premise should be true. (2) More specifically, is it true of the particular dataset you are working on? (3) In 5 out of the 9 scenarios shown in Simmons's Figure 3, Bayesian analysis performs equally well or better than parsimony even for the smallest analyzed dataset – hardly an indication that missing data "can drastically mislead Bayesian inference".

      Third, for matrices with high average evolutionary rates (therefore large amounts of noise, which translates to low consistency indices of MPTs), the performance of Bayesian inference and maximum parsimony converges when the number of characters increases far enough (Wright & Hillis, 2014: fig. 6; O’Reilly et al., 2016).

      Even if we accept that, it's irrelevant to the analysis you present in your preprint. With 276 characters, your dataset is much closer to the 350-character case simulated by both Wright & Hillis (2014) and O'Reilly et al. (2016), in which Bayesian inference strongly outperforms parsimony regardless of the rate of evolution.

      As expected, maximum parsimony is immune to this problem: it simply represents a lack of data as a lack of resolution.

      That's a strange assertion, since the very paper you cite in the previous sentence shows the exact opposite: according to O'Reilly et al. (2016), parsimony gives you false precision – i.e., trees that are better resolved but less accurate.

      Last but not least, parametric analyses are inherently much more time-consuming than nonparametric ones; our maximum-parsimony analyses took about four weeks in total, not including the bootstrap tests which also used maximum parsimony and took another four weeks.

      I find this shocking: I thought that parsimony was at least supposed to be fast! For what it's worth, analyzing one of the nexus files given in your supplementary information using the parallel version of MrBayes 3.2 on an Intel Core i7 machine (MCMC settings: ngen=10000000 nchains=4 temp=0.05) took me 183 minutes. Note that this was only enough to reach an average SD of split frequencies of less than 0.05; if you want less than 0.01, you'll need to run it longer.

      Then again, why not use CIPRES instead of your local machine?

      On top of that, for a matrix this size, not much difference in the performance of parsimony and Bayesian inference is expected.

      I actually mostly agree with that (except for the part about size), but performance is not the only reason why Bayesian inference is preferable to parsimony. You can simply do more stuff with branch lengths and posterior distributions.

      Delete
    9. Yay! Feedback on the preprint!!! :-) :-) :-) Too many characters for a single comment:

      First of all, thanks to Brad McFeeters for very recently pointing out that we cite Simmons (2011a, b) in the text while there's only one "Simmons (2011)" in the references list. That's the 2011a one. 2011b is:
      Simmons MP. 2011b (printed 2012). Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data. Molecular Phylogenetics and Evolution 62:472–484. DOI 10.1016/j.ympev.2011.10.017

      Fun fact: the whole section is there to console a reviewer who lamented that we hadn't tried model-based methods. In the next round, another reviewer said we should remove the whole section because parsimony is "industry standard" and doesn't need to be justified...

      Both MrBayes and BEAST can handle ordered as well as unordered multistate characters.

      In the same matrix? Good to have confirmation.

      I guess the "4 characters with more complex stepmatrices" could still pose a problem, but I doubt the tree would change drastically if they were analyzed in a more simplistic fashion.

      For 2 of them it would actually make sense to code them differently, and I'll try that in the next paper. For the other 2 it would not. The first 2 were coded about the same way, just unordered, in the original, and we're trying to make as few changes as necessary...

      One (I'm not sure which one) is so weird that PAUP* can't determine if it's even parsimony-informative.

      Can MrBayes and BEAST handle irreversible characters (Analyses R7–R12, and transitions away from state 5 of ch. 276 in all analyses)?

      Your second argument essentially says that the missing data distributions simulated by Simmons are more likely to occur in real-life morphological matrices than those simulated by Wright and Hillis, and that this is bad news for parametric methods. That's problematic on multiple levels: (1) Since Simmons designed his simulations to mimic molecular supermatrices made up of multiple partitions, it's not obvious why the premise should be true.

      As stated in the preprint, Wright & Hillis (2014: fig. 2) partitioned their matrices by rate of evolution, and removed all data for one rate category for some taxa. In morphological matrices, missing data are clustered by body part, not by rate of evolution. The way Simmons distributed missing data in his contrived and simulated matrices has nothing to do with rate, which is assumed to be uniform. The examples look realistic enough from morphology.

      (2) More specifically, is it true of the particular dataset you are working on?

      Missing data is very strongly clustered by body part. There are several taxa that are a skull, a whole skull, and nothing but the skull. In many others, the whole tail but little else is missing. Carpus & tarsus are often unossified; metapodials & digits are often incompletely preserved; and so on. Conversely, an appreciable number of characters can be scored for all taxa except the isolated lower jaws that I added.

      Delete
    10. Part 2 of 2:

      (3) In 5 out of the 9 scenarios shown in Simmons's Figure 3, Bayesian analysis performs equally well or better than parsimony even for the smallest analyzed dataset – hardly an indication that missing data "can drastically mislead Bayesian inference".

      The difference between 5 and 9 is the difference between "can" and "always does".

      With 276 characters, your dataset is much closer to the 350-character case simulated by both Wright & Hillis (2014) and O'Reilly et al. (2016), in which Bayesian inference strongly outperforms parsimony regardless of the rate of evolution.

      Er, yes. (They both used only binary characters, but the character states in our matrix are equivalent to something like 330 or perhaps 340 binary characters, not 1000.) It's not impossible that I'll end up at 1000 characters at the end of this project, but I'm not there yet.

      However: what are the units of evolutionary rate in the figures of Wright & Hillis (2014)? For 350 characters, the error rate of parsimony decreases faster than that of Bayesian inference does as the average rate of evolution increases beyond 2.5.

      That's a strange assertion, since the very paper you cite in the previous sentence shows the exact opposite: according to O'Reilly et al. (2016), parsimony gives you false precision – i.e., trees that are better resolved but less accurate.

      The dash between the sentences is like half a paragraph break. The citation for parsimony representing lack of data as lack of resolution is in the sentence after that: it's Simmons (2011a, b), who found that model-based methods can find OTUs that have no scores at all in common as sister-groups while parsimony (as of course expected) cannot. Parsimony can be misled by data (O'Reilly et al., 2016), model-based methods can be misled by lack of data (Simmons, 2011a, b). When parsimony lacks the data to resolve a trichotomy, it just doesn't resolve it, while model-based methods try anyway and sometimes get it spectacularly wrong (Simmons, 2011a, b). I should word that more clearly, thanks for pointing this out.

      I find this shocking: I thought that parsimony was at least supposed to be fast!

      Oh, it is. But the taxon sample is large, and some characters are weird. Plus, PAUP* doesn't do parallel – I think the new alpha versions do, but they can't deal with this matrix at all: within a second or something, they spit out a completely unresolved single MPT that was hit in every one of the 10,000 replicates. Must be due to the strange characters that PAUP* 4.0b10 can handle.

      Then again, why not use CIPRES instead of your local machine?

      ...Because I didn't know CIPRES existed. Thanks for pointing it out, I'll definitely try it...!

      You can simply do more stuff with branch lengths and posterior distributions.

      Any branch lengths from this matrix are spurious. As stated in the preprint (start of the Discussion), the character sample is much smaller than it could be, and it isn't random either (so just multiplying all branch lengths by 3 or 4 won't help). Posterior probabilities – that's where Simmons (2011a, b) comes in: missing data make them unreliable. With bootstrap values we can at least tell why those of certain nodes are so pathetically low (missing data in certain taxa).

      Delete
    11. In the next round, another reviewer said we should remove the whole section because parsimony is "industry standard" and doesn't need to be justified.

      I agree, and would also be in favor of removing it.

      In the same matrix?

      Yes. Simply add something like this to the mrbayes block in your nexus file:

      ctype ordered: 3 21 27

      Can MrBayes and BEAST handle irreversible characters [...]?

      MrBayes can do it out-of-the-box; it's as simple as adding the following line to your nexus file:

      ctype irreversible: 12 18 45 53 60 71 77 81 113 146 156 159 162 166 170 174 276

      In BEAST, it should be possible to implement irreversibility as well as arbitrarily complex stepmatrices, but you may need to do it manually in your XML file.

      As stated in the preprint, Wright & Hillis (2014: fig. 2) partitioned their matrices by rate of evolution, and removed all data for one rate category for some taxa. In morphological matrices, missing data are clustered by body part, not by rate of evolution.

      Those two are not mutually exclusive. Phenomena like mosaicism and modular evolution (i.e., different body parts evolving at different rates) have been known for a long time, and there was a paper in Syst. Biol. by Clarke and Middleton showing that it might be important to account for them in model-based morphological phylogenetics. Wright and Hillis cited it; I assume that's where the idea behind their simulations came from.

      The difference between 5 and 9 is the difference between "can" and "always does".

      Fair point; what I wrote didn't make sense. The obvious point I should have made instead is that the 5-to-9 ratio is an argument for, rather than against, using Bayesian inference.

      However: what are the units of evolutionary rate in the figures of Wright & Hillis (2014)?

      Changes per character.

      For 350 characters, the error rate of parsimony decreases faster than that of Bayesian inference does as the average rate of evolution increases beyond 2.5.

      I think this is just clutching at straws. First of all, I'm not even sure that the curves superimposed on the scatterplot are supposed to be trendlines of any sort, so it's hardly safe to draw conclusions like this from the figure (Wright and Hillis themselves certainly didn't do it.) Second, even as the rate increases beyond 2.5, all I can see is two almost perfectly parallel lines quite far apart from each other. In short, there's no way to construe the figure as anything but support for BI.

      When parsimony lacks the data to resolve a trichotomy, it just doesn't resolve it, while model-based methods try anyway and sometimes get it spectacularly wrong (Simmons, 2011a, b).

      You seem to be worried that BI is bound to give you a bunch of spurious nodes that have no support in the data. What I find strange is that you don't seem to be interested in testing this empirically. For what it's worth, the 50% majority-rule consensus I got from my short MrBayes run had a lot more polytomies in it than any of the parsimony trees shown in your preprint. Note that this contradicts Simmons (2011) but agrees very well with O'Reilly et al. (2016). Now, I didn't order any of the multistate characters or coded them as irreversible, and it's possible that the resolution would have improved had I done that. Still, it goes to show that the Simmons paper can't be used as an excuse for not running a Bayesian analysis in the first place.

      Delete
    12. Plus, PAUP* doesn't do parallel – I think the new alpha versions do, but they can't deal with this matrix at all: within a second or something, they spit out a completely unresolved single MPT that was hit in every one of the 10,000 replicates. Must be due to the strange characters that PAUP* 4.0b10 can handle.

      You see? One more reason to come over to the Bayesian dark side! We have continually updated, user-friendly software that has actually kept up with modern computing. :-)

      Any branch lengths from this matrix are spurious. As stated in the preprint (start of the Discussion), the character sample is much smaller than it could be, and it isn't random either (so just multiplying all branch lengths by 3 or 4 won't help).

      Once again, it's not obvious to me what the problem is. In particular, "the character sample is much smaller than it could be" is true of every analysis ever performed that wasn't based on whole genomes. Also, it could be interesting to see what exactly the effects of non-random character sampling are by partitioning your matrix by anatomical region and unlinking branch length across the partitions.

      Posterior probabilities – that's where Simmons (2011a, b) comes in: missing data make them unreliable. With bootstrap values we can at least tell why those of certain nodes are so pathetically low (missing data in certain taxa).

      There is a reason why I wrote "posterior distributions" and not "posterior probabilities". Having a sample from the posterior rather than a point estimate is extremely useful when it comes to accounting for topological uncertainty in downstream analyses (trait evolution, biogeography, etc.).

      Delete
    13. I agree, and would also be in favor of removing it.

      Huh. That seems to contradict everything else you've written here, including in the same comment.

      MrBayes can do it out-of-the-box; it's as simple as adding the following line to your nexus file:

      Just like PAUP*, then. Good.

      Those two are not mutually exclusive. Phenomena like mosaicism and modular evolution (i.e., different body parts evolving at different rates) have been known for a long time, and there was a paper in Syst. Biol. by Clarke and Middleton showing that it might be important to account for them in model-based morphological phylogenetics. Wright and Hillis cited it; I assume that's where the idea behind their simulations came from.

      Makes sense, but there's no way that, say, the entire skull or the entire dermatocranium evolves at the same rate – and we barely have an idea of what the "modules" are, except that they're different in different taxa.

      The obvious point I should have made instead is that the 5-to-9 ratio is an argument for, rather than against, using Bayesian inference.

      5 out of 9 times it's as good or better, 4 out of 9 times it's worse? At least half of the time there won't be a point to using it.

      Changes per character.

      ...That's weaksauce, then. The figures barely reach an average of 3 changes per character. My trees all have consistency indices below 0.2, which means an average of 5 changes per character per tree. (There are some that change only once, and some that change states over 30 times per tree.) We're completely outside the window Wright & Hillis investigated.

      What I find strange is that you don't seem to be interested in testing this empirically.

      Doesn't seem worth the effort. As I mentioned, I haven't touched MrBayes in 11 years, so there is an effort involved before we even get to the stepmatrices.

      For what it's worth, the 50% majority-rule consensus I got from my short MrBayes run had a lot more polytomies in it than any of the parsimony trees shown in your preprint. Note that this contradicts Simmons (2011) but agrees very well with O'Reilly et al. (2016).

      ...That's actually not surprising, because Simmons investigated what happens in molecular supermatrices where taxa may not have any scored characters in common. We have isolated skulls without lower jaws or nearly so, and I added isolated lower jaws, so there are now a few cases of this, but really not many. However, O'Reilly et al. didn't even mention missing data, and Wright & Hillis tested a quite different distribution of missing data.
      Ordering characters has strong unpredictable effects, in parsimony anyway.

      In particular, "the character sample is much smaller than it could be" is true of every analysis ever performed that wasn't based on whole genomes.

      Actually several issues. Molecular analyses that haven't used the whole genome have generally tried to pick particularly informative parts of it, as far as I'm aware. (Whether they've succeeded in doing that is a separate question.) Here, the criteria for character sampling are almost purely historical: the matrix is put together from earlier matrices and lists of diagnostic character states.

      Also, it could be interesting to see what exactly the effects of non-random character sampling are by partitioning your matrix by anatomical region and unlinking branch length across the partitions.

      Anatomical region isn't the only way the sampling is nonrandom. It would be interesting otherwise.

      a sample from the posterior rather than a point estimate

      Oh, that's what you mean. Yes, that's useful for things we can't do yet because our matrix isn't good enough for them. :-)

      Delete
    14. Huh. That seems to contradict everything else you've written here, including in the same comment.

      Well, I can think of two scenarios that would be (IMO) preferable to keeping the section in the paper. Either you find the arguments above convincing enough to run a Bayesian analysis on your dataset, in which case there is nothing to be justified. Or you decide to stick to parsimony, in which case the "industry standard" argument frankly carries more weight than any of the four you currently offer in your preprint, even if it's left implicit. Your point 1 is based on misconceptions about what current software can and cannot do; point 3 relies on what's at best a very shaky interpretation of two studies that may be ultimately irrelevant to your analysis (and whose take-home message actually contradicts the conclusions you draw from them); and point 4 can be easily solved by taking advantage of publicly available computing resources. Only point 2 is both relevant and not obviously incorrect; that said, I don't think it supports the claim you want to make (i.e., that at least for your particular dataset, it makes more sense to use parsimony than parametric methods).

      In conclusion: I agree that there's no point in running both parsimony and Bayesian analyses. If you face that choice, just use BI: it will let you do more stuff with your trees, and the trees themselves will probably be better, too, albeit not by a large margin (there will probably be few differences, and those will mostly have to do with resolution). Using parsimony only for the sake of convenience ("I haven't touched MrBayes in 11 years") is not a huge problem, either, if you care exclusively about topology. Probably the only thing you shouldn't do is try to justify your reliance on parsimony by claiming that it is somehow uniquely suited to the problem at hand – which is what you currently do in your preprint.

      Makes sense, but there's no way that, say, the entire skull or the entire dermatocranium evolves at the same rate

      But there is no assumption that it does. Each partition can have a discrete gamma distribution of among-character rate heterogeneity associated with it if it wants it.

      we barely have an idea of what the "modules" are, except that they're different in different taxa.

      Which is why the assignment of characters to partitions is not done a priori, but rather estimated from the data. The recently published 2nd version of PartitionFinder (Lanfear et al. 2017, Mol. Biol. Evol. 34: 772–3) can do just that even for morphological datasets.

      My trees all have consistency indices below 0.2, which means an average of 5 changes per character per tree.

      More than that, actually, since parsimony doesn't correct for multiple hits – otherwise it wouldn't be parsimonious.

      We're completely outside the window Wright & Hillis investigated.

      Not completely so; your data and their simulations have overlapping distributions of rates. But yeah, the mean of your distribution is shifted to the right compared to those investigated by Wright and Hillis.

      Delete
    15. ...That's actually not surprising, because Simmons investigated what happens in molecular supermatrices where taxa may not have any scored characters in common.

      And there might be other reasons, too. Simmons summarized his posterior distributions by taking "a strict consensus [...] of the tree(s) with the highest posterior probabilities that together sum to ≥ 0.5". I don't know of any Bayesian package that does that (BEAST uses the maximum clade credibility tree; MrBayes assembles the consensus tree out of all bipartitions occuring in more than 50% of MCMC samples), but it seems like a decent attempt to allow for a lack of resolution. In his Figure 4, though, he presented the maximum a posteriori (MAP) tree. MAP trees are always fully resolved, which is one of the reasons why they are not used to summarize the posterior (Holder et al. 2008, Syst. Biol. 57: 814–21).

      Actually several issues. Molecular analyses that haven't used the whole genome have generally tried to pick particularly informative parts of it, as far as I'm aware.

      I'd say that gene sampling in pre-whole genome analyses has been driven mostly by technical convenience (with respect to collection, extraction, alignment, and so on). There has been a clear bias toward transcribed regions in general and protein-coding genes in particular, for example. The cumulative character of morphological matrices that you describe also holds for phylogenomic datasets; many analyses are simply run on everything that can be downloaded from GenBank for the group of interest.

      Delete
    16. I talked to my coauthor. We'll actually try MrBayes in CIPRES. :-)

      Each partition can have a discrete gamma distribution of among-character rate heterogeneity associated with it if it wants it.

      Oh. Awesome.

      estimated from the data

      Better yet!

      Delete
    17. The analysis is running, but not in CIPRES, because finding out how many million generations it takes is a matter of trial & error. I split the 4 characters with stepmatrices into 2 or 3 ordered or unordered characters each (+ 1 irreversible for limb loss) and set 4 instead of the usual 2 simultaneous runs. It's converging very slowly; an average SD of 0.05 was reached somewhere between 6 and 7 million, we'll see if 25 million is enough to get to 0.01 (20 million weren't) or if it'll take 30.

      Delete
  14. Interesting discussion here... just want to add that I also have the apparently weird habit of checking supp info to submitted manuscripts, the painful thing being that authors and editors have sometimes not taken the changes on board or have glossed them over as irrelevancies.

    ReplyDelete
    Replies
    1. Good to know there are four of us now :-)

      Delete
    2. Oh i read it too. The Supp Info is often where the 'guts' of the paper is, not just the pitch. In my own papers (non-paleo) Supp Info is often where we put some of the most interesting findings, as well as the hard data.

      Delete
    3. Five. Our five – among our reviewers who read supp. inf. are... :-)

      Delete