OPM1-01Review-Inferences < Challenge

Challenge.OPM1-01Review-Inferences

Open Provenance Model Contents

6 Inferences

The Open Provenance Model has defined the notion of OPM graph based on a set of syntactic rules and the notion of Provenance Graph adding a set of topological constraints. Provenance graphs are aimed at representing causality graphs explaining how processes and artifacts came out to be. It is expected that a variety of reasoning algorithms will exploit this data model, in order to provide novel and powerful functionality to users. It is beyond the scope of this document to include an extensive coverage of relevant reasoning algorithms. However, provenance graphs, by means of edges, capture causal dependencies, which can be summarised by means of transitive closure that we describe in this section.

6.1 One Step Inferences

In Section 2, we have introduced the two causal dependencies triggeredBy and wasDerivedFrom acting as abbreviation for causal dependencies used and wasGeneratedBy. Figure 9 shows their exact meaning.

Figure 9: One Step Inference in the Provenance Model

Figures 10 and 11 formalize Figure 9 by introducing rules for each inference that can be performed in the Open Provenance Model. A rule consists of two expressions separated by a horizontal line. The expression above the line is a hypothesis, whereas the expression below the line is a conclusion that can be inferred from the hypothesis.

In Equation (1), a wasTriggeredBy edge is inferred from the existence of a used and wasGeneratedBy edges, as per described in Figure 9. We note that the inferred \triggeredBy edge relies on both accounts acc₂ and acc₃, hence, it is given _acc₂ ∪ acc₃ as account.

Figure 10: One Step Inference Rules (1)

Equation (2) is the reverse of Equation (1): it allows us to establish that the edge < p₂,p₁,acc > ∈ WasTriggeredBy is hiding the existence of some artifact a₂, used by p₂ and generated by p₁. The inferred edges used and wasGeneratedBy are asserted in the context of some account acc₂ and acc₃, whose union is the original account acc. We note that Equation (2) allows us to establish the existence of some artifact a₂ (and r₁,r₂,acc₁,acc₂) but it does not tell us what their ids and values are. This is the consequence of using wasTriggeredBy , which is a lossy summary of the composition of used and wasGeneratedBy.

The kind of inferences that can be made about wasDerivedFrom is of a different nature. Indeed, without any internal knowledge of P₁ in Figure 9, it is impossible to ascertain there is an actual data dependency between A₁ and A₂.

Remark. Concretely, a rule such as the following would lead to incorrect inferences since it allows arbitrary outputs to a process to be inferred to be dependent on arbitrary inputs to the same process.


⟨a₂,r₂,p₁,acc₂⟩∈WasGeneratedBy ∧ ⟨p₁,r₁,a₁,acc₁⟩∈Used



⟨a₂,a₁,acc₁⋃ acc₂⟩∈WasDerivedFrom

While it is unreasonable to infer an exact dependency by means of wasDerivedFrom , it is useful to be able to infer that a dependency may exist. To this end, we introduce the edge mayHaveBeenDerivedFrom that marks such a potential dependency. Hence, if < a₁,a₂ > ∈ WasDerivedFrom, then < a₁,a₂ > ∈ MayHaveBeenDerivedFrom, but not vice-versa. Hence, Equation (3) states that a mayHaveBeenDerivedFrom edge can be derived from the existence of a succession of wasGeneratedBy and used edges. Equation (4) is to (2) what wasDerivedFrom is to wasTriggeredBy.

Figure 11: One Step Inference Rules (2)

In rules 1 and 3, the inferred edges have accounts acc₂ ∪ acc₃ and acc₁ ∪ acc₂, respectively. Hence, the artifacts and processes connected by these edges will have an effective account membership modified accordingly. We note that rules 1 and 3 effectively creates relationships in the union of multiple account views.

6.2 Transitive Closure

Users want to find out the causes of an artifact, not due to one process, but potentially, due to an unknown number of them.

Hence, for the purpose of expressing queries or expressing inferences about provenance graphs, we introduce four new relationships, which are transitive versions of existing relationships, namely Used^*, WasGeneratedBy^*, WasDerivedFrom^* and TriggeredBy^*. Their definitions are displayed in Figure 12. We note that Figure 12 contains definitions (as opposed to inference rules of Figures 10 and 11, which specify which edges can be inferred from which edges). For convenience, we have also introduced a generic causal dependency wasDependentOn^∗ (see equations (9) to (12)). Note that similar inference rules can be defined for MayHaveBeenDerivedFrom.

Figure 12: Transitive Closures

Equations 7 and 8 are one of the multiple possible ways of defining edges used^* and wasGeneratedBy^*. Other definitions could be expressed and proved equivalent (such as used^* can be derived from a single used and _wasDerivedFrom^*).

Comments

In my mind, 'wasDerivedFrom' has the same meaning as 'mayHaveBeenDerivedFrom', since if the process model is looked into far enough, almost any causality might be found to be false. I don't think we should have two different types of linked; we might name the remaining one 'mayHaveBeenDerivedFrom' if it helps people agree on the meaning.

-- PatrickPaulson - 18 Aug 2008

to top

End of topic
Skip to action links | Back to top

Attachment	Action	Size	Date	Who
graph.jpg	manage	113.5 K	30 Jul 2008 - 18:47	PaulGroth
fig10.jpg	manage	44.1 K	31 Jul 2008 - 03:09	PaulGroth
fig11.jpg	manage	45.2 K	31 Jul 2008 - 03:10	PaulGroth
fig12.jpg	manage	140.9 K	31 Jul 2008 - 03:10	PaulGroth

You are here: Challenge > OPM > OPM1-01Review > OPM1-01Review-Inferences

to top

Provenance Challenge

6 Inferences

6.1 One Step Inferences

6.2 Transitive Closure

Comments