arrowCIRSS Home arrow Publications arrow Publication Detail

Retrospective provenance without a runtime provenance recorder

Full APA Reference

McPhillips, T., Bowers, S., Belhajjame, K., & Ludäscher, B. (2015, July). Retrospective provenance without a runtime provenance recorder. In Proceedings of the 7th USENIX Conference on Theory and Practice of Provenance (pp. 1-1). USENIX Association.

Publication Abstract

The YesWorkflow (YW) toolkit aims to provide users of script- ing languages such as Python, Perl, and R with many of the ben- efits of scientific workflow automation. YW requires neither the use of a workflow engine nor the overhead of adapting or instru- menting code to run in such a system. Instead, YW enables sci- entists to annotate their scripts with special comments that reveal the main computational blocks and dataflow dependencies other- wise implicit in scripts. YW tools extract and analyze these com- ments, represent scripts in terms of entities based on a typical scien- tific workflow model, and provide graphical workflow views (i.e., prospective provenance) of scripts. In this paper, we present a new extension of YW for inferring retrospective provenance from script executions without relying on a runtime provenance recorder. In- stead we exploit the common practice of scientists to embed im- portant pieces of provenance in directory structures and file names. For such “provenance-friendly” data organizations, we offer a new annotation mechanism based on URI templates. YW uses these to link conceptual-level prospective provenance with data files created at runtime, resulting in a powerful, integrated model of prospective and retrospective provenance. We present scientifically meaningful retrospective provenance queries for investigating an execution of a data acquisition workflow implemented as a Python script, and show how these queries can be evaluated using the YW toolkit.

See Also URL