Open Provenance Model (v1) Review
The first
OPM specification can be downloaded from
here.
Introduction
The purpose of this page is to act as an entry point to all comments/review on the
OPM model.
Please write your feedback and comments on this twiki page. Feel free to create new pages for specific discussion threads.
The review period is
Jan 1, 2008 - Feb 15, 2008.
The mailing list
provenance-challenge@ipaw.info for challenge related issues can also be used for this purpose. See
the Challenge page on how to subscribe.
Note: registration page to this twiki can be found at http : / / twiki.ipaw.info / bin / view / TWiki / TWikiRegistration2 (remove all spaces). Then, Email
LucMoreau? requesting your username to be added to the challenge editors group.
Comments
enter your comments here
From Paul Groth (pgroth at isi dot edu):
Overall, I like the basic concepts defined in the
OPM. Particularly, the notion of having multiple accounts. However, for a spec that is supposed to be implemented by many different provenance systems, I am concerned a bit by some of the document's complexity. I think the easier the spec is to understand the more likely it will be adopted.
- It is not clear to me what the role of the inference section plays in the document. As the document states there are many different ways of reasoning about causal graphs, depending on the type of inference procedure used there may be other valid or core inferences. Why these? I think this document should only describe the common representation of provenance graphs. Maybe the section about inferences should be moved to a separate document and discuss the relationship between the rules proposed here and other more detailed mathematical treatments of causality (see Pearl).
- Following on from the above comment, I think Section 8 should really be renamed to Time Constraints not Time Constraints and Inferences.
- To further simplify the model, I wonder if there needs to be the concept of an agent? An agent seems to be another form of a process. Institutions, individuals, operating systems (examples of agents given in the paper) are processes: they have a lifetime they are composed of a series of actions which result in artefacts. Why can't a process give context to other processes? It seems to me that a process can be the catalyst of other processes as well as enable, facilitate, control and affect other processes execution. If the concern is attribution, I think this is best handled through convention. For example, always use this name for the process representing institution X.
- In agents were removed, there would be no need for the best practice section at the end of the document. Best practice seems a bit much given that there's not much practice using the OPM.
- As I stated before, I like the notion of multiple accounts. However, I think there needs to be mechanism to assign attribution to the account. It is crucial to be able to know who/what is responsible for a given account as it allows for judgements to be made about the reliability of one account compared to another. Attribution could be introduced by a simple map from responsible party (as represented by a process) to an account.
- In Section 5, you revisit each of the rules in Section 4, however, you do not address rules 14 - 17. To be complete, there should be something in this section.
- In Section 4, why is it necessary to state, in rule 17, that a provenance graph does not need time annotations. I think this rule is redundant as the OPM is clearly defined as being timeless.
- In the paragraph following Definition 5, I'm not sure what you really what to say or mean with "processes are modelled as instantaneous".
- Finally, while not directly related to the document, I think that maybe the availability of a toy API alongside the document would aid developers in understanding the OPM. The OPM is simple enough that any subsequent changes to the API would be trivial to reflect.
to top