Editing IC 8 Catalogue, curation, provenance

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
{{DISPLAYTITLE:IC_8 Catalogue, curation, provenance}}
 
 
== <span style="color: #BBCE00">Background</span> ==
 
== <span style="color: #BBCE00">Background</span> ==
  
Line 79: Line 78:
  
 
=== <span style="color: #BBCE00">Use case type</span> ===
 
=== <span style="color: #BBCE00">Use case type</span> ===
 
'''''Implementation case'''''
 
 
''Conditions:''
 
 
* ''Each RI describes its portfolio of new and/or enhanced services that they expect from ENVRIPLUS results, derived from the ENVRIPLUS WPs;''
 
* ''ENVRIPLUS staff work with the RIs on these descriptions, which in the course of the project will be gradually updated with more details.''
 
 
''Implications:''
 
 
* ''Implementation cases selected and adopted by interested RIs;''
 
* ''Both RIs and ENVRIPLUS invest in the actual implementation and associated services.>''
 
 
=== <span style="color: #BBCE00">Scientific domain and communities</span> ===
 
 
'''<span style="color: #BBCE00">Scientific domain</span>'''
 
 
To be relevant in ENVRIPLUS context, the implementated functions must be validated by at least 2 RIs, preferably in 2 different spheres (bio, liquid, solid, gas):
 
 
* Atmosphere: IAGOS
 
* Biosphere: ANAEE
 
* Geosphere: EPOS
 
* Hydrosphere: SeaDataNet, Euro-ARGO
 
 
 
'''<span style="color: #BBCE00">Community</span>'''
 
 
* data acquisition community (e.g. observation system catalogue)
 
* data curation community maintain the catalogues and use them as a management tool for datasets (and other digital objects) inventory.
 
* data publication community uses catalogues to parameterize their actions (e.g. mint a DOI an an object requires metadata)
 
* data service provision community use catalogues to configure their services, including workflow composition to retrieve input datasets and process them, but they might not be considered as a first priority target.
 
* data usage community use catalogue for discovery and with contextualisation including provenance for asessment of relevance and quality and for traceability. Data curation community is targeted for curation as a provider but as well by the other community who uses curated information to work.
 
 
 
'''<span style="color: #BBCE00">Behavior</span>'''
 
 
The connected behaviours are:<br>
 
Data acquisition community:
 
 
* '''Instrument configuration and calibration''' need to be registered in catalogues.
 
* '''Data collection:''' Data collected from sensors need to be curated, at least safe guarded in early stage on replicated storage. The description of the observation need to be pushed in catalogue. Data curation community:
 
* '''Data quality checking:''' the quality assesment performed on dataset need to be documented in catalogue. Different versions of same datasets with different quality control performed need to be managed.
 
* '''Data preservation:''' catalogue and datasets need to be preserved for long term. With replicated copies, format maintenance and a data management plan (DMP)
 
* '''Data product generation:''' input and output datasets of the products need to be managed in catalogues and curated. For provenance the description of the product processing need to be managed in catalogue as well.
 
* '''Data replication''' is handle by curation sub-use case. Data publication community:
 
* '''Data publication:''' the information managed in catalogue and curated datasets should be clean enough for publication
 
* '''Semantic hamonisation:''' the content of catalogue and datasets will be homogeneous syntactically (format) and rely on harmonized thesaurus references (e.g .SKOS) to support semantic harmonisation.
 
* '''Data discovery and access:''' data discovery will be enabled in a harmonized way. Visualization and download access will be provided when available but not harmonized. The data access provided here will not take benefit of the provenance information (user tracking, profile, ...)
 
* '''Data citation:''' the metadata required for data citation will be available in catalogue. The DOIs or PID of datasets will be described in catalogue as well. However the function of creating a DOI is not manage by the use case.
 
 
Data Service provision community:
 
 
* '''service description and registration:''' the service should be described in a catalogue, however this may not be a first priority in this use case to manage this information.
 
* '''Service coordination and composition''' can use catalogues of datasets and services to schedule and organize service. However as said above this is not a priority in the current use case. Data usage community: behaviours of the community will be supported by the catalogue especially in aspects of discovery, contextualisation (including provenance), availablity (through curation).
 
* '''User profile management:''' Users are recorded in the catalogue and by matching processes harmonizes with other user directories (e.g. OrcID) managed in the use case.
 
 
== <span style="color: #BBCE00">Detailed description</span> ==
 
 
'''<span style="color: #BBCE00">Objective and Impact</span>'''
 
 
'''Catalogue'''<br>
 
The catalogue aims at providing functions cross-cutting RI, to edit and discover the following items:
 
 
* systems for observation and processing (processing is in lowest priority)
 
* observations event and results (e.g. samples)
 
* datasets
 
* documents
 
* persons
 
* research objects (lowest priority)
 
 
<u>Action  1</u>: Persons and documents will be described and federated in pre-existing e-infrastructures, to be defined (e.g. orcID, …) so to fulfill requirements for the provenance and curation functions.<br>
 
<u>Action  2</u>: Datasets description will be federated from harvesting the datasets catalogue (in whatever 'standard' metadata format) in each RI in a single entry point (metadata format to be chosen among: DC, DCAT, INSPIRE/ISO19115, geonetworks, CKAN, CERIF ) to be defined so to fulfill requirements for the provenance and curation functions.<br>
 
<u>Action  3</u>: Observation systems, events and results (including collected samples) edition and discovery functions will be implemented by a combination of RI specific tools and federated tools (e.g. for edition) so to fulfill requirements for the provenance and curation functions.
 
 
 
'''<span style="color: #BBCE00">Challenges</span>'''
 
 
The main challenge is the involvement of RI, from definition of the functions to the adoption of the solution.
 
 
 
'''<span style="color: #BBCE00">Detailed scenarios</span>'''
 
 
In the context of the 3 above actions:
 
 
# define curation and provenance functions to be provided, identify related requirement on catalogue (format and access API).
 
# define catalogue requirements  for discovery and access
 
# define metadata profile and access API
 
# implement the centralized or federated solution
 
 
As for AGILE, the steps can be iterative by having new iteration for new requirements identified or RI supported.
 
 
[[File:Worddav391ee37988e66979fe6cd5249485105f.png]]
 
 
'''<span style="color: #BBCE00">Technical status and requirements</span>'''
 
 
E-infrastructures which manage catalogues of persons and documents are existing, available through standard interfaces and cross-cutting RI.<br>
 
Catalogues of datasets are generally provided by RI and their content is available through standard interfaces. Some tools are available on the shelf to implement the catalogue of datasets (DC, DCAT, INSPIRE/ISO19115, geonetworks, CKAN, CERIF). ENVRIPLUS need to federate them by utilising the richest available 'standard' and providing mappings to the others.<br>
 
Catalogue of observation systems, events or samples may exists in RI. They are seldom or never accessible through standard interfaces. Some RI lack proper tools to manage these information which is however critical for the good quality and traceability of scientific results.
 
 
 
'''<span style="color: #BBCE00">Implementation plan and timetable</span>'''
 
 
'''Documents and persons'''<br>
 
E-infrastructures which manage catalogues of persons and documents are existing.<br>
 
The implementation case will define a list of official sustainable person and document repository which should be used by RI to describe their resources. and define mappings to/from the ENVRIPLUS catalogue metadata standard (when chosen)<br>
 
''Expected result in Octobre 2016''
 
 
'''Datasets'''<br>
 
The implementation case will identify catalogues of datasets in RI and analyse their machine to machine interface for harvesting purpose. A single tool will harvest them centrally. Then their metadata will require conversion from local RI format to that of the ENVRIPLUS central catalogue as described above.<br>
 
''Expected result in Octobre 2017''
 
 
'''Observation systems, events or samples'''<br>
 
An integrated system will shows observations systems, events and collected samples from 2 or 3 RI in liquid (EMSO, ARGO), solid (EPOS) and gas (ICOS) spheres.<br>
 
Tools will be provided to easily edit the descriptions for RI which would not have their own system yet.<br>
 
As before this will rquire mapping the metadata describing systems, events, samples at each RI to the common metadata standard of ENVRIPLUS.<br>
 
''Expected result in Octobre 2018''
 
 
 
'''<span style="color: #BBCE00">Expected output and evaluation of output</span>'''
 
 
'''Documents and persons'''<br>
 
number of RI actually using the chosen person and document e-infrastructure to identify their resources.
 
 
'''Datasets'''<br>
 
Number of RI which dataset results descriptions are available in the federated system.<br>
 
Number of users of the federated dataset catalogue (inside or outside the RI).
 
 
'''Observation systems, events or samples'''<br>
 
Number of observation systems which events and results are actually available in the federated catalogue.<br>
 
Number of users of the catalogues as support of the activities in the RI.
 
 
== <span style="color: #BBCE00">External Links</span> ==
 
# IC_8 notebook: [https://envriplus.manageprojects.com/projects/wp9-service-validation-and-deployment-1/notebooks/659 {+}][https://envriplus.manageprojects.com/projects/wp9-service-validation-and-deployment-1/notebooks/659+ <span style="color: #222222">https://envriplus.manageprojects.com/projects/wp9-service-validation-and-deployment-1/notebooks/659+</span>]
 
 
 
[[Category:Use Cases]]
 

Please note that all contributions to may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Copyrights for details). Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)