Tuesday, May 31, 2011

The myth of coding from specimens firsthand and the untapped resource of photos

You've probably heard it many times.  Advice from professional paleontologists about the proper way to code specimens.  For instance, here's Brochu from the DML in 2000-

"One thing I've noticed as associate editor of JVP is that reviewers are growing less patient with phylogenetic analyses that do not address the specimens themselves, and which instead code taxa from publications. This is being viewed increasingly as unacceptable, and I wholeheartedly embrace that view. It's the specimens that are our primary data."

I completely agree that the specimens are our primary data and that coding from specimens is preferrable to any other resource.  When I was younger back in 2000 and such, I would picture a paleontologist poring over a specimen in his hands, turning it this way and that under the light, only to triumphantly type a 0 or 1 into Nexus Data Editor and move on to the next character.  If only the world were so kind.  The dirty truth is that this is generally not the way things work, and indeed can't be, given financial and business considerations.

Any decent cladistic analysis needs a large number of taxa, and for most analyses this means specimens will be spread over the world.  For the original TWG analysis of Norell et al., seeing all the relevent specimens would mean going to the AMNH, BMNH, BPM, BSP, BYU, CEU, CMN, DINO, FMNH, GMV, HMN, IGM, IVPP, JM, LH, MNU, MOR, MUCP, NGMC, PIN, PVPH, ROM, RTMP, UA, UCMZ, USNM, WDC, YPM and ZPAL collections.  China, Mongolia, Russia, Argentina, Poland, England, Spain, Germany, Canada and over ten states of the US.  If you're lucky, you'll see the specimens on a traveling exhibit (with the caveat it usually makes them harder to examine up close) or on loan to another museum.  Many museums have casts, but these are of varying quality.  Realistically, very few paleontologists are going to have the resources to see all the specimens.  Travel cost is simply too high.

But people do manage to travel, and many papers indicate specimens were consulted for coding.  I myself visited the AMNH twice, and they happened to have many IGM specimens at the time as well.  When I write my papers, I'll put down my reference for Saurornithoides as "AMNH 6516".  But the truth is my codings don't come from looking at the specimen in person.  I saw it, I held it, sure.  But when you visit a museum collection, you get 6 hours or so per day, since they're only open for so long.  And there are usually several revelent specimens in a museum, sometimes an extremely large number (AMNH, IGM, IVPP, MOR, RTMP, etc.).  Moreover, there are usually rules about removing only one specimen from cabinets at a time, filling out cards to replace them in the meantime, etc..  And you want to be careful, since nobody wants to be "the one who dropped Ornitholestes' skull".  If I were to try to code Ornitholestes for the TWG matrix while looking at it in the AMNH collections, it would near certainly take my entire time for that day and more.  Any good matrix has at least a couple hundred characters, often several hundred.  It takes time to code.  And while people have the resources to visit museum collections, I highly doubt most have the resources to return every day for a week or two.  And realistically, matrices aren't made by having a list of characters, and running through them for every taxon, a taxon at a time.  Often comparing taxa will lead to new interpretations (as in my therizinosaur accessory trochanter example) or a taxon's morphology will lead you to redefining your states or adding a new character.  Who's going to go back to New York to see if Ornitholestes has more than ten maxillary teeth after they've rewritten their character to be "11 or more teeth" instead of "9 or more"?  And once you have a new/revised matrix several years down the line, and new taxa have been discovered, are you supposed to go on your whorlwind worldwide tour again?  Curators can do these things for specimens in their care, as can other researchers who live by a museum or have specimens on loan to them, but nobody can do them for the majority of specimens.

So how do people "code from specimens"?  They take photos.  Lots of photos.  And they code from those.  They're often better than the literature because they're in color and from as many angles as you want, but with the internet publication quality is improving.  There would be almost no reason to see Australovenator for myself, for instance, since Hocknull et al. did such a good job of photographing it.  There are certainly things photographs don't show well- sutures and restoration on some specimens, depth of depressions, some texture.  But these are hardly numerous enough to justify hundreds of dollars to see yourself.  "The literature" has gotten a bad name, but its photos can be just as good as your own, and its descriptions are usually written by people with as much or more knowledge and experience as you.  This is good news for all of us though, since it means anyone can have access to the same resources the professionals use for most specimens, without travel costs.  The internet's gone a long way to providing a Shiny Digital Future for publication access, but I think we could do more. 

What if there was an online database of specimen photos, in high resolution color, that anyone could access?  The museums' permission would be needed of course, and undescribed specimens could be excluded if under study, but it sure beats everyone spending their resources to photograph the same things.  It's also better than the current situation where people have photos of poorly described specimens, but aren't allowed to distribute them, even if they've been in the literature for over a decade and have no plans for redescription.  The odd thing is, a person is generally allowed to travel to the collection and take their own photos, but not recieve or distribute those which have already been taken.  I don't want people to think I'm just bitter about lacking access myself, as there are plenty of specimens I have photos of (both taken myself and kindly provided by others) and aren't allowed to distribute.  So I'm on both sides.  But surely such a system is broken when we're witholding information from each other that we could get for hundreds of dollars in travel fees and won't be redescribed soon anyway. 

I'd be willing to throw my (distributable) photos into such a project if someone were to set it up.  The primary obstacle besides getting museum permission would be the huge storage space, but it could probably even be done on Flickr or Picasa.  What does everyone think?


  1. Isn't this what Morphobank already does? The Paleobiology database also allows uploading of photos (although this feature is very poorly advertised).

  2. I wish the search function in 'morphbank.net' (not confusing it with 'morphobank.org') was better organised.

  3. As I said on the DML list. With regard to photo sharing - I think a LOT of other palaeontologists would agree with you, myself included.

    150ish signatures of support to such effect can be found here.

    We can easily demonstrate support for change, the question in my mind is - how do we get change to *happen*