Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.OPM1-01Review-Inferences

Start of topic | Skip to actions
Open Provenance Model Contents
  1. Introduction
  2. Basics
  3. Overlapping and Hierarchichal Descriptions
  4. Provenance Graph Definition
  5. Timeless Formal Model
  6. Inferences
  7. Formal Model and Time Annotations
  8. Time Constraints and Inferences
  9. Support for Collections
  10. Example of Representation
  11. Conclusion
  12. Best Practice on the Use of Agensts
  13. References

6 Inferences

The Open Provenance Model has defined the notion of OPM graph based on a set of syntactic rules and the notion of Provenance Graph adding a set of topological constraints. Provenance graphs are aimed at representing causality graphs explaining how processes and artifacts came out to be. It is expected that a variety of reasoning algorithms will exploit this data model, in order to provide novel and powerful functionality to users. It is beyond the scope of this document to include an extensive coverage of relevant reasoning algorithms. However, provenance graphs, by means of edges, capture causal dependencies, which can be summarised by means of transitive closure that we describe in this section.

6.1 One Step Inferences

In Section 2, we have introduced the two causal dependencies triggeredBy and wasDerivedFrom acting as abbreviation for causal dependencies used and wasGeneratedBy. Figure 9 shows their exact meaning.

One Step Inference in the Provenance Model
Figure 9: One Step Inference in the Provenance Model

Figures 10 and 11 formalize Figure 9 by introducing rules for each inference that can be performed in the Open Provenance Model. A rule consists of two expressions separated by a horizontal line. The expression above the line is a hypothesis, whereas the expression below the line is a conclusion that can be inferred from the hypothesis.

In Equation (1), a wasTriggeredBy edge is inferred from the existence of a used and wasGeneratedBy edges, as per described in Figure 9. We note that the inferred \triggeredBy edge relies on both accounts acc2 and acc3, hence, it is given _acc2acc3 as account.

One Step Inference Rules (1)
Figure 10: One Step Inference Rules (1)

Equation (2) is the reverse of Equation (1): it allows us to establish that the edge < p2,p1,acc > ∈ WasTriggeredBy is hiding the existence of some artifact a2, used by p2 and generated by p1. The inferred edges used and wasGeneratedBy are asserted in the context of some account acc2 and acc3, whose union is the original account acc. We note that Equation (2) allows us to establish the existence of some artifact a2 (and r1,r2,acc1,acc2) but it does not tell us what their ids and values are. This is the consequence of using wasTriggeredBy , which is a lossy summary of the composition of used and wasGeneratedBy.

The kind of inferences that can be made about wasDerivedFrom is of a different nature. Indeed, without any internal knowledge of P1 in Figure 9, it is impossible to ascertain there is an actual data dependency between A1 and A2.

Remark. Concretely, a rule such as the following would lead to incorrect inferences since it allows arbitrary outputs to a process to be inferred to be dependent on arbitrary inputs to the same process.

a2,r2,p1,acc2⟩∈WasGeneratedBy ∧ ⟨p1,r1,a1,acc1⟩∈Used

a2,a1,acc1⋃ acc2⟩∈WasDerivedFrom

While it is unreasonable to infer an exact dependency by means of wasDerivedFrom , it is useful to be able to infer that a dependency may exist. To this end, we introduce the edge mayHaveBeenDerivedFrom that marks such a potential dependency. Hence, if < a1,a2 > ∈ WasDerivedFrom, then < a1,a2 > ∈ MayHaveBeenDerivedFrom, but not vice-versa. Hence, Equation (3) states that a mayHaveBeenDerivedFrom edge can be derived from the existence of a succession of wasGeneratedBy and used edges. Equation (4) is to (2) what wasDerivedFrom is to wasTriggeredBy.

One Step Inference Rules (2)
Figure 11: One Step Inference Rules (2)

In rules 1 and 3, the inferred edges have accounts acc2 ∪ acc3 and acc1 ∪ acc2, respectively. Hence, the artifacts and processes connected by these edges will have an effective account membership modified accordingly. We note that rules 1 and 3 effectively creates relationships in the union of multiple account views.

6.2 Transitive Closure

Users want to find out the causes of an artifact, not due to one process, but potentially, due to an unknown number of them.

Hence, for the purpose of expressing queries or expressing inferences about provenance graphs, we introduce four new relationships, which are transitive versions of existing relationships, namely Used*, WasGeneratedBy*, WasDerivedFrom* and TriggeredBy*. Their definitions are displayed in Figure 12. We note that Figure 12 contains definitions (as opposed to inference rules of Figures 10 and 11, which specify which edges can be inferred from which edges). For convenience, we have also introduced a generic causal dependency wasDependentOn (see equations (9) to (12)). Note that similar inference rules can be defined for MayHaveBeenDerivedFrom.

Transitive Closures
Figure 12: Transitive Closures

Equations 7 and 8 are one of the multiple possible ways of defining edges used* and wasGeneratedBy*. Other definitions could be expressed and proved equivalent (such as used* can be derived from a single used and _wasDerivedFrom*).


Comments

In my mind, 'wasDerivedFrom' has the same meaning as 'mayHaveBeenDerivedFrom', since if the process model is looked into far enough, almost any causality might be found to be false. I don't think we should have two different types of linked; we might name the remaining one 'mayHaveBeenDerivedFrom' if it helps people agree on the meaning.

-- PatrickPaulson - 18 Aug 2008


to top


End of topic
Skip to action links | Back to top

I Attachment sort Action Size Date Who Comment
graph.jpg manage 113.5 K 30 Jul 2008 - 18:47 PaulGroth  
fig10.jpg manage 44.1 K 31 Jul 2008 - 03:09 PaulGroth  
fig11.jpg manage 45.2 K 31 Jul 2008 - 03:10 PaulGroth  
fig12.jpg manage 140.9 K 31 Jul 2008 - 03:10 PaulGroth  

You are here: Challenge > OPM > OPM1-01Review > OPM1-01Review-Inferences

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.