CIRSS Director Professor Bertram Ludäscher presented the keynote address at WORKS 2015
November 15, 2015
(Originally published at the GSLIS newsroom)
From the abstract: An often touted advantage for using scientific workflow systems is their ability to capture provenance information during execution. The idea is that a controlled environment such as a workflow system makes it easy to record relevant observables, e.g., data read and write events. The captured provenance can then be used to document data lineage, to debug faulty runs, to speed up reruns of workflows by reusing unchanged parts, or more generally, to support the reproducibility of computational science experiments. In this talk, I will first give an overview of different notions, forms, and research questions around data and workflow provenance. . . . In the second part of the talk I will take a critical look at the current use of provenance information from scientific workflows and scripts and argue that open, interoperable tools are needed that can combine different forms of available provenance, e.g., recorded or reconstructed retrospective provenance together with prospective provenance given by a workflow specification or via high-level user-defined annotations in scripts. To this end, I will describe YesWorkflow, a new project and toolkit under development that combines different forms of provenance information to allow users to answer questions about the data created and used during workflow runs and script executions.
Dr. Ludäscher, the director of the Center for Informatics Research in Science and Scholarship (CIRSS) at GSLIS, is an expert in data and knowledge management, focusing on the modeling, design, and optimization of scientific workflows, provenance, data integration, and knowledge representation. He is one of the founders of the open source Kepler scientific workflow system project and a co-lead of the DataONE Working Group on Provenance in Scientific Workflows. Ludäscher also develops workflow technology for quality control and data curation; e.g., of biodiversity data in natural history collections. He leads the NSF-funded Euler project, where he is developing logic-based methods for the alignment and merging of biological taxonomies. In addition to his role at GSLIS, Ludäscher holds an appointment at NCSA and an affiliate appointment at the Department of Computer Science. He received his MS in computer science from the Technical University of Karlsruhe in 1992 and his PhD in computer science from the University of Freiburg in 1998.