The Open Provenance Model (v1.01)
Contents
- Introduction (Below)
- Basics
- Overlapping and Hierarchichal Descriptions
- Provenance Graph Definition
- Timeless Formal Model
- Inferences
- Formal Model and Time Annotations
- Time Constraints and Inferences
- Support for Collections
- Example of Representation
- Conclusion
- Best Practice on the Use of Agensts
- References
Notes on the Wiki Version
This is a wiki version of the Open Provenance Model version 1.01. This is based on the authoritative pdf that can be found at
http://eprints.ecs.soton.ac.uk/16148/1/opm-v1.01.pdf . It is designed to help track comments and suggestions for the next revision of the
OPM.
In terms of comments, each section its own comment area. Please leave your comments there. Make sure to add your signature as well so we know the provenance of the comments. If you do not want to modify the wiki itself, there's a comment box which you can use.
If you really need to leave comments within the text itself, please use another color and try to make your comment stand out from the rest of the text.
Authors
Luc Moreau (Editor) | (U. of Southampton) |
Beth Plale | (Indiana U.) |
Simon Miles | (King’s College) |
Carole Goble, Paolo Missier | (Manchester U.) |
Roger Barga, Yogesh Simmhan | (Microsoft) |
Joe Futrelle, Robert E. McGrath, Jim Myers | (NCSA) |
Patrick Paulson | (PNNL) |
Shawn Bowers, Bertram Ludaescher | (U. Davis) |
Natalia Kwasnikowska, Jan Van den Bussche | (U. Hasselt) |
Tommy Ellkvist, Juliana Freire | (U. Utah) |
Paul Groth | (USC) |
Abstract
In this paper, we introduce the Open Provenance Model , a model for provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define the model in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any "thing", whether produced by computer systems or not. (5) To define a core set of rules that identify the valid inferences that can be made on provenance graphs.
1 Introduction
Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle [5]. Interest for provenance in the "e-science community" [13] is also growing, since provenance is perceived as a crucial component of workflow systems [2] that can help scientists ensure reproducibility of their scientific analyses and processes.
Against this background, the
International Provenance and Annotation Workshop (IPAW'06), held on May 3-5, 2006 in Chicago, involved some 50 participants interested in the issues of data provenance, process documentation, data derivation, and data annotation [8, 1]. During a session on provenance standardization, a consensus began to emerge, whereby the provenance research community needed to understand better the capabilities of the different systems, the representations they used for provenance, their similarities, their differences, and the rationale that motivated their designs.
Hence, the first Provenance Challenge was born, and from the outset, the challenge was set up to be
informative rather than
competitive. The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. Participants simulated or ran a Functional Magnetic Resonance Imaging workflow, from which they implemented and executed a pre-identified set of ``provenance queries''. Sixteen teams responded to the challenge, and reported their experience in a journal special issue [10].
The first Provenance Challenge was followed by the second Provenance Challenge, aiming at establishing inter-operability of systems, by exchanging provenance information. Thirteen teams [12] responded to this second challenge. Discussions indicated that there was substantial agreement on a core representation of provenance. As a result, following a workshop in August 2007, in Salt Lake City, a data model was crafted and released as the
Open Provenance Model (v1.00) [9].
The starting point of this work is the community agreement summarized by Miles [7]. We assume that provenance of objects (whether digital or not) is represented by an annotated causality graph, which is a directed acyclic graph, enriched with annotations capturing further information pertaining to execution. For the purpose of this paper, a provenance graph is defined to be
a record of a past execution (or current execution), and not a description of something that could happen in the future.
The
Open Provenance Model (
OPM) is a model for provenance that is designed to meet the following requirements:
- To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model.
- To allow developers to build and share tools that operate on such provenance model.
- To define the model in a precise, technology-agnostic manner.
- To support a digital representation of provenance for any "thing", whether produced by computer systems or not.
- To define a core set of rules that identify the valid inferences that can be made on provenance graphs.
While specifying this model, we also have some _non_-requirements:
- It is not the purpose of this document to specify the internal representations that systems have to adopt to store and manipulate provenance internally; systems remain free to adopt internal representations that are fit for their purpose.
- It is not the purpose of this document to define a computer-parsable syntax for this model; model implementations in XML, RDF or others will be specified in separate documents, in the future.
- We do not specify protocols to store such provenance information in provenance repositories.
- We do not specify protocols to query provenance repositories.
On June 19th 2008, twenty participants attended the first
OPM workshop [3] to discuss the version of the specification. Minutes of the workshop and recommendations [4] were published, and led to the current version (v1.01) of the Open Provenance Model.
Comments
to top