arrowCIRSS Home arrow Publications arrow Publication Detail

Two-Stream Model: Toward Data Production for Sharing Field Science Data

Full APA Reference

Baker, K. S., Palmer, C. L., Thomer, A. K., Wickett, K., DiLauro, T., Asangba, A. E., Fouke, B. W., & Choudhury, G. S. (2013, December). Two-Stream Model: Toward Data Production for Sharing Field Science Data. Poster presented at the 46th annual Fall Meeting of the American Geophysical Union, San Francisco, CA.

Publication Abstract

Scientific data play a central role in the production of knowledge reported in scientific publications. Today, data sharing policies together with technological capacity are fueling visions of data as open and accessible where data appear to stand-alone as products of the research process. Yet, guidelines and outputs are constantly being produced that impact subsequent work with the data, particularly in field-oriented, data-rich earth science research. We propose a model that focuses on two distinct yet intertwined data streams: internal-use data and public-reuse data. Internal-use data often involves a complex mix of processing, analysis and integration strategies creating data in forms leading to the publication of papers. Public-reuse data is prepared with a more standardized set of procedures creating data packages in the form of well-described, parameter-based datasets for release to a data repository and for reuse by others. While scientific researchers are familiar with collecting and analyzing data for publication in the scientific literature, the second data stream helps to identify tasks relating to the preparation of data for future, unanticipated reuse. The second stream represents an expansion in conceptualization of data management for the majority of natural scientists from a publication metaphor to recognition of a release metaphor (Parsons and Fox 2012). A combined dual-function model brings attention to some of the less recognized barriers that impede preparation of data for reuse. Digital data analysis spawns a multitude of files often assessed while in use, so for reuse of data, scientists must first identify what data files to share. They must also create robust data processes that frequently involve establishing new distributions of labor. The two-stream approach creates a visual representation for data generators who now must think about what data are most likely to have value not only for their work but also for the work of others. Development of this approach is part of a collaborative project studying site-based data curation in geobiology for geologists, geochemists, and microbiologists at Yellowstone National Park.