PASS terminology
ancestry information - provenance records, mostly cross-references between pnodes, representing the flow of data during execution. Ancestry information may also be flat text records; an example might be "the URL I downloaded this from".
cross-reference record - a provenance record that points to another pnode. So far, cross-references point to specific versions, not whole objects, but we don't necessarily claim that a cross-reference pointing to a whole object is inherently ill-formed. Cross-reference records may be either
ancestry information or
identity information.
cycle - if an object's ancestry includes itself, directly or indirectly, its provenance contains a
cycle.
cycle-breaking - algorithms or steps taken during provenance collection to ensure that the output provenance does not create cycles.
flat record - a provenance record whose value is a text string. As opposed to
cross-reference record. Most flat records are
identity information but some may be
ancestry information.
freeze - a version of an object is considered complete when it is
frozen. Just exactly when objects should be frozen can be debated at some length. Most of the proposed
cycle-breaking algorithms work by forcing additional freezes.
identity information - provenance records, mostly flat text records, that describe the identity of an object rather than its creation or history. Identity information may also be cross-reference records; one example we've encountered is "this file was standard input when the process was started."
object or
provenanced object - any "thing" for which provenance is collected or stored.
phony - in PASSv2, we have a model for allowing provenance systems to stack on top of one another. Objects tracked by higher levels of a system are often aggregates or subsections of objects that lower levels understand; or sometimes they may be entirely conceptual. In order to manage provenance properly, these objects must be instantiated at the lower level; we call the instantiations
phony objects or
phonies. Note that whether an object is "phony" depends on your perspective: to a file system, anything that isn't a file is a phony; however, to an application, users and projects and book chapters and other such things are very real, even though they will appear as phonies at lower levels. This term has proven somewhat confusing but has also become entrenched.
pnode (etymology: derived from "inode") - a container for the complete provenance of a single object, including both
identity information and
ancestry information. Thus also sometimes shorthand for the provenance itself.
provenance record - the basic unit of provenance; an attribute/value pair. May be either a
flat record (where the value is a text string) or a
cross-reference record (where the value is a pointer to another object and version).
recycle - in
PASS v2, when an object is completely emptied of data (such as when a file is truncated to zero length, or when a process successfully calls
execve
) we say that it is
recycled.
thaw - start a new version of an object. The version number is incremented. The new version is "under construction" until a
freeze occurs.
version - when files (and other things as well) are updated in place, different versions are created over time. A provenance system must keep track of these versions, because the distinctions may be vital. It is excessively expensive to allow every kernel-level write operation to create a new version; however, folding versions together introduces the possibility of
cycles and necessitates
cycle-breaking. Versions are created at
thaw time and are considered complete at
freeze time.
version pumping - a form of
provenance explosion caused by the interaction of circular data flow with naive
cycle-breaking algorithms.
--
PassProject - 24 Feb 2007
to top