arrowCIRSS Home arrow Publications arrow Publication Detail

Linking Prospective and Retrospective Provenance for Scripts

Full APA Reference

Dey, S., Belhajjame, K., Koop, D., Raul, M., & Ludäscher, B. (2015). Linking Prospective and Retrospective Provenance for Scripts. In Intl. Workshop on Theory and Practice of Provenance (TaPP).

Publication Abstract

Scripting languages like Python, R, and MATLAB have seen signif- icant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the prove- nance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by iden- tifying the data inputs and the processing steps that were used to produce it. By and large, the provenance information captured for scripts is fine-grained in the sense that it captures data dependencies at the level of script statement, and do so for every variable within the script. While useful, the amount of recorded provenance in- formation can be overwhelming for users and cumbersome to use. This suggests the need for abstraction mechanisms that focus at- tention on specific parts of provenance relevant for analyses. To- ward this goal, we propose that fine-grained provenance informa- tion recorded as the result of script execution can be abstracted us- ing user-specified, workflow-like views. Specifically, we show how the provenance traces recorded by noWorkflow can be mapped to the workflow specifications generated by YesWorkflow from scripts based on user annotations. We examine the issues in constructing a successful mapping, provide an initial implementation of our solu- tion, and present competency queries illustrating how a workflow view generated from the script can be used to explore the prove- nance recorded during script execution.

See Also URL