New Provenance Queries
At the first provenance challenge workshop, it was suggested that a wider range of provenance queries be devised than was present in the first challenge. The aim of this exercise is to better distinguish between approaches and to better determine what is important to and within scope of provenance systems. During the workshop, a set of general query topics was discussed. Here, we attempt to place them in the terms of the fMRI workflow used in the first provenance challenge to make them more concrete. The details of this workflow can be found on the
first provenance challenge page.
We encourage participants to edit this page in the following ways:
- Annotate the queries below with statements of whether they consider such queries important for their users, in scope for their system or irrelevant to provenance.
- Add further provenance queries that demonstrate aspects considered important but not illustrated in the first challenge.
- Edit provenance queries to remove any ambiguities.
Please make it clear on the page who you are when providing each of the above.
The contents of this page will be discussed at the second provenance challenge workshop.
Long-Term Use of Provenance
The fMRI workflow was run by user X on 2006 September 14th. Five years later, the workflow has been adapted and run many times, the AIR suite and its data file formats have changed substantially, GIFs are no longer in common use and user X has left research to become a farmer.
- Find the process that led to Atlas X Graphic.
- Given a description of a workflow that is believed to be the same as the one that produced Atlas X Graphic, determine whether this is true.
- Determine which version of the AIR suite is required to parse the image header files.
- Find out who ran the workflow, i.e. the identity of user X.
Causal Information Outside the Workflow
The new brain images that were inputs to the challenge workflow, Anatomy Images 1 to 4, were produced by four other workflow executions, run independently some time before the challenge workflow. The preceding workflow takes as input some configuration files, Configuration 1 to 4, for the runs generating each image.
- Find the process that led to Atlas X Graphic, where this should include details of the configuration files used to produce the new brain images merged to create Atlas X Graphic.
Non-Workflow Processes
The new brain images that were inputs to the challenge workflow, Anatomy Images 1 to 4, were emailled in a archive (.tar.gz) file from another (remote) user before being opened and the workflow run.
Find the process that led to Atlas X Graphic, where this should include the identity of who emailled the archive from which the workflow inputs were taken, and the time at which those images wer added to the archive by the remote user.
Multiple Levels of Granularity
The challenge workflow can be abstracted away from its original form, to be seen as composed into only three stages:
- average takes a set of new brain images (plus headers) and a reference image (plus header) and produces an averaged image aligned to the same coordinates as the reference.
- slicer is the same as before.
- convert is the same as before.
The
average procedure actually involves executing the original three stages of the challenge workflow.
- Find the provenance of Atlas X Graphic at both levels of granuality, i.e. with average or with align_warp, reslice and softmean.
One Process Affecting Another
During the enactment of the challenge workflow, another concurrently executing workflow, the corruption workflow, accidentally affects the file Warp Params 1 between its production by align_warp and its use by reslice.
- Find out whether and how the files involved in producing Atlas X Graphic were effected by processes other than the challenge workflow.
Effects of External Events
During the enactment of the challenge workflow, the hard disk becomes full, meaning that the output of softmean, Atlas Image, is an empty file, and the workflow then crashes.
- Find out the process that led to Atlas Image and why it is empty.
Unwritten Queries
There are two aspects of provenance discussed in the first challenge workshop for which I am unsure how to write practical queries such as those below. It would be good for those that understand the problem being suggested to add queries for these aspects, preferably tied to the challenge workflow.
- Inferring processes that occurred outside the workflow from the "studyModality" annotation
- Resolving conflicting causal relationships in the provenance data
--
SimonMiles - 04 Oct 2006
to top