arrowCIRSS Home arrow Events arrow CIRSS Seminars arrow Event Details

CIRSS Seminar - Introduction to the Whole Tale Project

Friday, September 2, 2016
4pm - 5pm

126 IS

Event Details

Session leaders: Bertram Ludäscher, Matt Turk, Victoria Stodden
Description: PI's Bertram Ludäscher, Matt Turk and Victoria Stodden will present an introduction to the Whole Tale project, a five-year project newly funded by the NSF CC*DNI DIBBS program.  The goal of the Whole Tale project is to enable researchers to examine, transform, and then seamlessly republish research data and code used in an article, in order to enable new discovery by allowing researchers to construct representations and syntheses of data.  Presenters will provide an overview of project activities getting underway now.

The full project abstract is provided below:

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible from scholarly publications. The third layer is broad, encompassing numerous research communities through science pathways (e.g., in astronomy, life and earth sciences, materials science, social science), and deep, using interconnected cyberinfrastructure pathways and shared technologies.

The goal of this project is to strengthen the second layer of research output, and to build a robust third layer that integrates all parts of the story, conveying the holistic experience of reproducible scientific inquiry by (1) exposing existing cyberinfrastructure through popular frontends, e.g., digital notebooks (IPython, Jupyter), traditional scripting environments, and workflow systems; (2) developing the necessary 'software glue' for seamless access to different backend capabilities, including from DataNet federations and Data Infrastructure Building Blocks (DIBBs) projects; and (3) enhancing the complete data-to-publication lifecycle by empowering scientists to create computational narratives in their usual programming environments, enhanced with new capabilities from the underlying cyberinfrastructure (e.g., identity management, advanced data access and provenance APIs, and Digital Object Identifier-based data publications). The technologies and interfaces will be developed and stress-tested using a diverse set of data types, technical frameworks, and early adopters across a range of science domains.

* Biosketches

Bertram Ludäscher is Director of the iSchool's Center for Informatics Research in Science and Scholarship (CIRSS), and Professor at the iSchool, NCSA, and the Department of Computer Science. He conducts research in scientific data management, scientific workflows, and data provenance. His research interests also include foundations of databases, knowledge representation, and reasoning.  Ludäscher applies this work in a number of domains, including biodiversity informatics and taxonomy.

Matthew Turk is a research scientist at NCSA, interested in analysis and visualization of data, community-building around academic software, and the dynamics of open source ecosystems.

Victoria Stodden is an associate professor in the School of Information Sciences at the University of Illinois at Urbana-Champaign, with affiliate appointments in the School of Law, the Department of Computer Science, the Department of Statistics, the Coordinated Science Laboratory, and the National Center for SuperComputing Applications. She completed both her PhD in statistics and her law degree at Stanford University.  Her research centers on the multifaceted problem of enabling reproducibility in computational science. This includes studying adequacy and robustness in replicated results, designing and implementing validation systems, developing standards of openness for data and code sharing, and resolving legal and policy barriers to disseminating reproducible research.

Related People

Related Projects