IC_2 Provenance implementation
Contents
Background
Short description
In close cooperation with EUDAT, a cross RI provenance model shall be developed (B2PROF) and usage of this model by workflow engines (Taverna, dispel, GEF) and services like B2HANDLE (service for persistent identifier), B2NOTE (service for data and metadata annotation), B2SHARE (data repository service together with a service for persistent identifiers and a service for metadata) and others shall be enabled.
Contact
Background | Contact Person | Organization | Contact email | |
_ICT | e-Infrastructure] (Use Case proposer, Agile Group Leader )_ | Barbara Magagna | Umweltbundesamt GmbH | Barbara.magagna@umweltbundesamt.at |
_ICT | e-Infrastructure]_ | Marco Rorro | EUDAT | m.rorro@cineca.it |
_ICT | e-Infrastructure]_ | Giovanni Morelli | EUDAT | g.morelli@cineca.it |
Use case type
Implementation case
Scientific domain and communities
Scientific domain
[Atmosphere | Biosphere | hydrosphere | geosphere]
Cross all domains.
Community
[Data Acquisition | Data Curation | Data Publication | Data Service Provision | Data Usage | Or others]
All of them
Behavior
Data product generation/ Data Replication/Data Publication/Semantic Harmonisation/Data Discovery and Access/Data Citation
Roles
Data curator/Data publication repository/Service Provider
Detailed description
Objective and Impact
Enables the tracking of the complete data life cycle and the access of the data user to this information
This helps to identify datasets which are derived from the same source but displayed or published or processed in different ways or on different data services.
This also helps to acknowledge any step in the life cycle and the corresponding actor in this step, it supports the citation of them und thus the willingness to publish the data.
Challenges
- Find a common provenance schema (e.g. PROV)
- Harmonize existing schemata
- Map existing schemata to the common schema
Detailed scenarios
tbd
Technical status and requirements
In development based on existing provenance tracking in different projects (e.g. EUDAT, myExperiment, BioVel, …)
Implementation plan and timetable
- Timeline: tbd
- Milestones:
- Define minimum information that has to be tracked
- Find a conceptual model for provenance which conforms to the needed information
- Map existing models to the common model
- Find repository to store the provenance information
- Establish a definition and implementation plan for services which should provide provenance information
- Establish a definition and implementation plan for publication services in order that provenance information is delivered together with data and metadata
- Involved RIs: LTER, ICOS, …
- Links to ENVRI Plus workpackages/tasks: 5.3, 5.4, 6.1, 8.1, 8.2, 8.3
- Allocation of resources: tbd
- Expected output and evaluation of output
- (1) Provenance Information delivered together with data and metadata.
- (2) Showcase for selected datasets coming from selected RIs (e.g. LTER)