Keep Your Data Alive!

By Hannah Brame, PhD 2021

Pterophyllum nathorsti (fossilized leaves, Jurassic, Douglas County, Oregon) which was used for a leaf carbon analysis in a graduate student thesis project. This is part of the Oscar M. Ball Plant collection, which now includes original specimen numbers as well as registered IGSNs. Image courtesy of the UT Non-Vertebrate Paleontology Lab.

Last month, director and curator of the UT Non-vertebrate Paleontology Lab Ann Molineux gave an fantastic talk titled “Forensic Tools to Track and Connect Physical Samples to Related Data.” Ann Molineux has been an invaluable member of the UT Austin paleontological data community through her fantastic curatorial work, and also through her outreach emphasizing the importance of high quality data collection and subsequent data curation. Her talk highlighted the importance of metadata for museum collections and published data, specifically emphasizing careful collection and curation of metadata and the acquisition (and proper citation and implementation) of a unique International Geo Sample Number (IGSN) for all relevant geology and paleontology derived data. Although this talk was paleontology-themed, the recommended protocols are applicable to a wide range of geological disciplines. As a graduate student, the main take-aways included the importance of maintaining a well documented and traceable “bread crumb trail” of data – from sweaty and exhausting field notes, to sub-sampling in the lab (e.g., geochemical analyses, thin sectioning, etc), to publication; as well as a general consideration of data as it pertains to databases, museum collections and the global sharing of data.

As graduate students, undergraduate researchers, and faculty, we are all embarking on a mission to collect a very specific set of data to answer a very specific set of questions. However, that data can take on a life of its own, being drawn upon by the global community of scientists with a range of scientific disciplines and research objectives. To give a shout-out to the humorous detective novel by Douglas Adams, Dirk Gently’s Holistic Detective Agency, we should think of data in terms of the interconnectedness of all things. Who can say what our data may be used for in the future? We have an obligation and an opportunity to gather and curate data that has applicability beyond that which we can imagine.

Image: Example of core sub-sampling procedure. After a section with plant fossils was archived at the Non-Vertebrate Paleontology Lab, a “place holder” was inserted to mark the location within the Bureau of Economic Geology (BEG) core archive. The “place holder” includes the full suite of BEG, NPL and IGSN identification numbers for the sample.

I would, with the blessing of the global data community, make a call to scientists at large, to consider the future applications of their data. We are moving into an era of unprecedented data availability, and the critical quality of data may not be simply what you collected it for, but what it can be used for in the future. I would argue that this is particularly true for data that is exhaustible (i.e., data that is consumed during the data acquisition and analysis process, such as geochemical analysis or tissue sampling), and data that has limited accessibility (i.e., data from international field sites or historical data from field localities that may no longer exist). In the words of glam band Cinderella, “You don’t know what you’ve got (‘til it’s gone).” All of us have looked at historical data and wished, “If only they had included stratigraphic data. If only they had collected abundance data. If only we had a precise geographic location.” I have had those same thoughts about my own data! We can design our data collection to fulfill both our research objectives as well as the globally recognized standards for high quality and broadly applicable data. As we graduate and publish, we should take care to properly cite all data, as well as register our data with the appropriate data repositories so that our data can, in turn, be properly referenced.

So, going forward, we should all endeavor to collect the highest quality data possible, and maintain that integrity throughout the lifespan of our data. Reach out to your colleagues. Reach out to our data repositories (e.g., UT’s museums, the Paleobiology Database, VertNet, Global Biodiversity Information Facility, and the links below). Try to imagine your data from the perspective from the next generation of scientists, and make your data the highest quality possible.

For additional information, please see the following:

System for Earth Sample Registration (SESAR):

EarthCube – “The Internet of Samples in the Earth Sciences (iSamples)”:

Interdisciplinary Earth Data Alliance: