Difference between revisions of "Identification and Citation for AnaEE"

From
Jump to: navigation, search
(Created page with "== <span style="color: #BBCE00">Context of identification and citation in AnaEE</span> == AnaEE is still in its preparatory phase, and therefore - as pointed out by the AnaEE...")
 
(Detailed requirements)
 
Line 9: Line 9:
  
 
=== <span style="color: #BBCE00">IDENTIFICATION</span> ===
 
=== <span style="color: #BBCE00">IDENTIFICATION</span> ===
<span style="color: #BBCE00">1) What granularity do your RI’s data products have:</span>
+
<span style="color: #BBCE00">'''1) What granularity do your RI’s data products have:'''</span>
  
:<span style="color: #BBCE00">a) Content-wise (all parameters together, or separated e.g. by measurement category)?</span>
+
:<span style="color: #BBCE00">'''a) Content-wise (all parameters together, or separated e.g. by measurement category)?'''</span>
  
 
The data are collected into distributed site data bases. Some of them may be gathered at the national level. A querying interface allows to get data flexibly at different level (from a parameter to the whole data of given site/experiment or even from different sites). We have two kind of data sets :
 
The data are collected into distributed site data bases. Some of them may be gathered at the national level. A querying interface allows to get data flexibly at different level (from a parameter to the whole data of given site/experiment or even from different sites). We have two kind of data sets :
Line 19: Line 19:
 
- from short term experiment as in controlled conditions in ECOTRON where the data are gathered in a project data base.
 
- from short term experiment as in controlled conditions in ECOTRON where the data are gathered in a project data base.
  
:<span style="color: #BBCE00">b) Temporally (yearly, monthly, daily, or other)?</span>
+
:<span style="color: #BBCE00">'''b) Temporally (yearly, monthly, daily, or other)?'''</span>
  
 
Yearly, monthly, daily, hourly and sometimes at higher temporal resolution
 
Yearly, monthly, daily, hourly and sometimes at higher temporal resolution
  
:<span style="color: #BBCE00">c) Spatially (by measurement station, region, country or all together)?</span>
+
:<span style="color: #BBCE00">'''c) Spatially (by measurement station, region, country or all together)?'''</span>
  
 
By measurement station or a network of stations. We don’t produce data products representative of an area whatever the scale.
 
By measurement station or a network of stations. We don’t produce data products representative of an area whatever the scale.
  
<span style="color: #BBCE00">2) How are the data products of your RI stored - as separate “static” files, in a database system, or a combination?</span>
+
<span style="color: #BBCE00">'''2) How are the data products of your RI stored - as separate “static” files, in a database system, or a combination?'''</span>
  
 
Mainly in Data Base information systems.
 
Mainly in Data Base information systems.
  
<span style="color: #BBCE00">3) How does your RI treat the “versioning” of data - are older datasets simply replaced by updates, or are several versions kept accessible in parallel? How do you identify different version of the same dataset?</span>
+
<span style="color: #BBCE00">'''3) How does your RI treat the “versioning” of data - are older datasets simply replaced by updates, or are several versions kept accessible in parallel? How do you identify different version of the same dataset?'''</span>
  
 
Not yet addressed. We start the data production at the France level. It is intended to expose the latest updates on the data base system. However published data will be versioned according to the update of their content.
 
Not yet addressed. We start the data production at the France level. It is intended to expose the latest updates on the data base system. However published data will be versioned according to the update of their content.
  
<span style="color: #BBCE00">4) Is it important to your data users that:</span>
+
<span style="color: #BBCE00">'''4) Is it important to your data users that:'''</span>
  
:<span style="color: #BBCE00">a) Every digital data object is tagged with a unique & persistent digital identifier (PID)?</span>
+
:<span style="color: #BBCE00">'''a) Every digital data object is tagged with a unique & persistent digital identifier (PID)?'''</span>
  
 
Not yet. It is intended to have PID at the data set level (a site , an experiment). Not mature for finer descriptions (eg parameter , variable …). However we are working on the annotation of the data using a ontological approach which would lead to unique identification of every parameter.
 
Not yet. It is intended to have PID at the data set level (a site , an experiment). Not mature for finer descriptions (eg parameter , variable …). However we are working on the annotation of the data using a ontological approach which would lead to unique identification of every parameter.
  
:<span style="color: #BBCE00">b) The metadata for data files contains checksum information for the objects?</span>
+
:<span style="color: #BBCE00">'''b) The metadata for data files contains checksum information for the objects?'''</span>
  
 
not yet applicable
 
not yet applicable
  
:<span style="color: #BBCE00">c) Metadata (including any documentation about the data object contents) is given its own persistent identifier?</span>
+
:<span style="color: #BBCE00">'''c) Metadata (including any documentation about the data object contents) is given its own persistent identifier?'''</span>
  
 
Not yet. Will be using DOI
 
Not yet. Will be using DOI
  
:<span style="color: #BBCE00">d) Metadata and data objects can be linked persistently by means of PIDs?</span>
+
:<span style="color: #BBCE00">'''d) Metadata and data objects can be linked persistently by means of PIDs?'''</span>
  
 
not yet applicable
 
not yet applicable
  
<span style="color: #BBCE00">5) Is your RI currently using, or planning to use, a standardized system based on persistent digital identifiers (PIDs) for:</span>
+
<span style="color: #BBCE00">'''5) Is your RI currently using, or planning to use, a standardized system based on persistent digital identifiers (PIDs) for:'''</span>
  
:<span style="color: #BBCE00">a) “Raw” sensor data?</span>
+
:<span style="color: #BBCE00">'''a) “Raw” sensor data?'''</span>
  
 
Not yet decided. However there will be a strong probability that raw data will be stored together with processed data in order. The aim is to make reprocessing possible by users.
 
Not yet decided. However there will be a strong probability that raw data will be stored together with processed data in order. The aim is to make reprocessing possible by users.
  
:<span style="color: #BBCE00">b) Physical samples?</span>
+
:<span style="color: #BBCE00">'''b) Physical samples?'''</span>
  
 
Not yet implemented. It is planned to annotate persistently the different objects on which observations are made (soil sample, soil layer, plot, tree, animal…)
 
Not yet implemented. It is planned to annotate persistently the different objects on which observations are made (soil sample, soil layer, plot, tree, animal…)
  
:<span style="color: #BBCE00">c) Data undergoing processing (QA/QC etc.)?</span>
+
:<span style="color: #BBCE00">'''c) Data undergoing processing (QA/QC etc.)?'''</span>
  
 
Not yet implemented. It is intended to define different levels of processing (L0, L1, L2, L3 … ) and have an array with quality code. Some of the level (not necessarily all) will need to have a PID
 
Not yet implemented. It is intended to define different levels of processing (L0, L1, L2, L3 … ) and have an array with quality code. Some of the level (not necessarily all) will need to have a PID
  
:<span style="color: #BBCE00">d) Finalized “publishable” data?</span>
+
:<span style="color: #BBCE00">'''d) Finalized “publishable” data?'''</span>
  
 
Not yet decided.
 
Not yet decided.
  
<span style="color: #BBCE00">6) Please indicate the kind of identifier system that are you using - e.g. Handle-based (EPIC or DOI), UUIDs or your own RI-specific system?</span>
+
<span style="color: #BBCE00">'''6) Please indicate the kind of identifier system that are you using - e.g. Handle-based (EPIC or DOI), UUIDs or your own RI-specific system?'''</span>
  
Not yet . Our plan is to use DOI for published data set and or own specific system for the description at the parameter level..
+
Not yet. Our plan is to use DOI for published data set and or own specific system for the description at the parameter level..
  
<span style="color: #BBCE00">7) If you are using Handle-based PIDs, are these handles pointing to “landing pages”? If so, are these pages maintained by your RI or an external organization (like the data centre used for archiving)?</span>
+
<span style="color: #BBCE00">'''7) If you are using Handle-based PIDs, are these handles pointing to “landing pages”? If so, are these pages maintained by your RI or an external organization (like the data centre used for archiving)?'''</span>
  
 
Not yet decided.
 
Not yet decided.
  
<span style="color: #BBCE00">8) Are costs associated with PID allocation and maintenance (of landing pages etc.) specified in your RI’s operational cost budget?</span>
+
<span style="color: #BBCE00">'''8) Are costs associated with PID allocation and maintenance (of landing pages etc.) specified in your RI’s operational cost budget?'''</span>
  
 
Not yet adressed
 
Not yet adressed
Line 86: Line 86:
 
=== <span style="color: #BBCE00">CITATION</span> ===
 
=== <span style="color: #BBCE00">CITATION</span> ===
  
<span style="color: #BBCE00">9) How does your “designated scientific community” (typical data users) primarily use your data products? As input for modelling, or for comparisons?</span>
+
<span style="color: #BBCE00">'''9) How does your “designated scientific community” (typical data users) primarily use your data products? As input for modelling, or for comparisons?'''</span>
  
 
Both
 
Both
  
<span style="color: #BBCE00">10) Do your primary user community traditionally refer to datasets they use in publications:</span>
+
<span style="color: #BBCE00">'''10) Do your primary user community traditionally refer to datasets they use in publications:'''</span>
  
:<span style="color: #BBCE00">a) By providing information about producer, year, report number if available, title or short description in the running text (e.g. under Materials and Methods)?</span>
+
:<span style="color: #BBCE00">'''a) By providing information about producer, year, report number if available, title or short description in the running text (e.g. under Materials and Methods)?'''</span>
  
 
Yes in material and method, with appropriate reference and appropriate acknowledgement
 
Yes in material and method, with appropriate reference and appropriate acknowledgement
  
:<span style="color: #BBCE00">b) By adding information about producer, year, report number if available, title or short description in the References section?</span>
+
:<span style="color: #BBCE00">'''b) By adding information about producer, year, report number if available, title or short description in the References section?'''</span>
  
 
See previous
 
See previous
  
:<span style="color: #BBCE00">c) By DOIs, if available, in the References section?</span>
+
:<span style="color: #BBCE00">'''c) By DOIs, if available, in the References section?'''</span>
  
 
Not widely yet, But could be used
 
Not widely yet, But could be used
  
:<span style="color: #BBCE00">d) By using other information?</span>
+
:<span style="color: #BBCE00">'''d) By using other information?'''</span>
  
 
No other known practices
 
No other known practices
  
:<span style="color: #BBCE00">e) By providing the data as supplementary information, either complete or via a link</span>
+
:<span style="color: #BBCE00">'''e) By providing the data as supplementary information, either complete or via a link'''</span>
  
 
Yes
 
Yes
  
<span style="color: #BBCE00">11) Is it important to your data users to be able to refer to specific subsets of the data sets in their citation? Examples:</span>
+
<span style="color: #BBCE00">'''11) Is it important to your data users to be able to refer to specific subsets of the data sets in their citation? Examples:'''</span>
  
:<span style="color: #BBCE00">a) Date and time intervals</span>
+
:<span style="color: #BBCE00">'''a) Date and time intervals'''</span>
  
 
yes
 
yes
  
:<span style="color: #BBCE00">b) Geographic selection</span>
+
:<span style="color: #BBCE00">'''b) Geographic selection'''</span>
  
 
yes
 
yes
  
:<span style="color: #BBCE00">c) Specific parameters or observables</span>
+
:<span style="color: #BBCE00">'''c) Specific parameters or observables'''</span>
  
 
yes
 
yes
  
:<span style="color: #BBCE00">d) Other</span>
+
:<span style="color: #BBCE00">'''d) Other'''</span>
  
 
Data quality, accuracy,
 
Data quality, accuracy,
  
<span style="color: #BBCE00">12) Is it important to be able to refer to many separate datasets in a collective way, e.g. having a collection of “all data” from your RI represented by one single DOI?</span>
+
<span style="color: #BBCE00">'''12) Is it important to be able to refer to many separate datasets in a collective way, e.g. having a collection of “all data” from your RI represented by one single DOI?'''</span>
  
 
Yes at a site level or for an experiment that produced several datasets. Not necessarily to the whole RI
 
Yes at a site level or for an experiment that produced several datasets. Not necessarily to the whole RI
  
<span style="color: #BBCE00">13) What strategy does your RI have for collecting information about the usage of your data products?</span>
+
<span style="color: #BBCE00">'''13) What strategy does your RI have for collecting information about the usage of your data products?'''</span>
  
 
Not yet fully defined
 
Not yet fully defined
Line 140: Line 140:
 
It is expected to have a registration of users (account in the Information System), download tracking, identification in scientific publication, citation (DOI, publication/report transmission …)
 
It is expected to have a registration of users (account in the Information System), download tracking, identification in scientific publication, citation (DOI, publication/report transmission …)
  
:<span style="color: #BBCE00">a) Downloads/access requests</span>
+
:<span style="color: #BBCE00">'''a) Downloads/access requests'''</span>
  
 
Yes, access requests
 
Yes, access requests
  
:<span style="color: #BBCE00">b) Visualization at your own data portal</span>
+
:<span style="color: #BBCE00">'''b) Visualization at your own data portal'''</span>
  
 
Not yet defined
 
Not yet defined
  
:<span style="color: #BBCE00">c) Visualization at other data portals</span>
+
:<span style="color: #BBCE00">'''c) Visualization at other data portals'''</span>
  
 
No.
 
No.
  
:<span style="color: #BBCE00">d) References in scientific literature</span>
+
:<span style="color: #BBCE00">'''d) References in scientific literature'''</span>
  
 
Yes
 
Yes
  
:<span style="color: #BBCE00">e) References in non-scientific literature</span>
+
:<span style="color: #BBCE00">'''e) References in non-scientific literature'''</span>
  
 
Yes if easily collected
 
Yes if easily collected
  
:<span style="color: #BBCE00">f) Scientific “impact”</span>
+
:<span style="color: #BBCE00">'''f) Scientific “impact”'''</span>
  
 
To be defined
 
To be defined
  
<span style="color: #BBCE00">14) Who receives credit when a dataset from your RI is cited?</span>
+
<span style="color: #BBCE00">'''14) Who receives credit when a dataset from your RI is cited?'''</span>
  
:<span style="color: #BBCE00">a) The RI itself</span>
+
:<span style="color: #BBCE00">'''a) The RI itself'''</span>
  
 
Yes
 
Yes
  
:<span style="color: #BBCE00">b) The RI’s institutional partners (all or in part, depending on the dataset contents)</span>
+
:<span style="color: #BBCE00">'''b) The RI’s institutional partners (all or in part, depending on the dataset contents)'''</span>
  
 
Yes
 
Yes
  
:<span style="color: #BBCE00">c) Experts in the RI’s organization (named individuals)</span>
+
:<span style="color: #BBCE00">'''c) Experts in the RI’s organization (named individuals)'''</span>
  
 
no
 
no
  
:<span style="color: #BBCE00">d) “Principal investigators” in charge of measurements or data processing (named individuals)</span>
+
:<span style="color: #BBCE00">'''d) “Principal investigators” in charge of measurements or data processing (named individuals)'''</span>
  
 
yes.
 
yes.
  
:<span style="color: #BBCE00">e) Staff (scientists, research engineers etc.) performing the measurements or data processing (named individuals)</span>
+
:<span style="color: #BBCE00">'''e) Staff (scientists, research engineers etc.) performing the measurements or data processing (named individuals)'''</span>
  
 
yes
 
yes
  
<span style="color: #BBCE00">15) What steps in tooling, automation and presentation do you consider necessary to improve take up of identification and citation facilities and to reduce the effort required for supporting those activities?</span>
+
<span style="color: #BBCE00">'''15) What steps in tooling, automation and presentation do you consider necessary to improve take up of identification and citation facilities and to reduce the effort required for supporting those activities?'''</span>
  
 
How to deal with incremental datasets?
 
How to deal with incremental datasets?
  
 
How to link annotation on ontology and PID?
 
How to link annotation on ontology and PID?
 
 
  
 
== <span style="color: #BBCE00">Formalities (who & when)</span> ==
 
== <span style="color: #BBCE00">Formalities (who & when)</span> ==

Latest revision as of 19:40, 30 March 2020

Context of identification and citation in AnaEE[edit]

AnaEE is still in its preparatory phase, and therefore - as pointed out by the AnaEE representatives - it should be noted that many of the questions on this topic could only be answered in a very preliminary way.

Summary of AnaEE's requirements for identification and citation[edit]

Detailed requirements[edit]

The following information was contributed via e-mail by the RI representatives directly to the topic coordinator Maggie Hellström.

IDENTIFICATION[edit]

1) What granularity do your RI’s data products have:

a) Content-wise (all parameters together, or separated e.g. by measurement category)?

The data are collected into distributed site data bases. Some of them may be gathered at the national level. A querying interface allows to get data flexibly at different level (from a parameter to the whole data of given site/experiment or even from different sites). We have two kind of data sets :

- from long term experiment where the data are collected in a site data base.

- from short term experiment as in controlled conditions in ECOTRON where the data are gathered in a project data base.

b) Temporally (yearly, monthly, daily, or other)?

Yearly, monthly, daily, hourly and sometimes at higher temporal resolution

c) Spatially (by measurement station, region, country or all together)?

By measurement station or a network of stations. We don’t produce data products representative of an area whatever the scale.

2) How are the data products of your RI stored - as separate “static” files, in a database system, or a combination?

Mainly in Data Base information systems.

3) How does your RI treat the “versioning” of data - are older datasets simply replaced by updates, or are several versions kept accessible in parallel? How do you identify different version of the same dataset?

Not yet addressed. We start the data production at the France level. It is intended to expose the latest updates on the data base system. However published data will be versioned according to the update of their content.

4) Is it important to your data users that:

a) Every digital data object is tagged with a unique & persistent digital identifier (PID)?

Not yet. It is intended to have PID at the data set level (a site , an experiment). Not mature for finer descriptions (eg parameter , variable …). However we are working on the annotation of the data using a ontological approach which would lead to unique identification of every parameter.

b) The metadata for data files contains checksum information for the objects?

not yet applicable

c) Metadata (including any documentation about the data object contents) is given its own persistent identifier?

Not yet. Will be using DOI

d) Metadata and data objects can be linked persistently by means of PIDs?

not yet applicable

5) Is your RI currently using, or planning to use, a standardized system based on persistent digital identifiers (PIDs) for:

a) “Raw” sensor data?

Not yet decided. However there will be a strong probability that raw data will be stored together with processed data in order. The aim is to make reprocessing possible by users.

b) Physical samples?

Not yet implemented. It is planned to annotate persistently the different objects on which observations are made (soil sample, soil layer, plot, tree, animal…)

c) Data undergoing processing (QA/QC etc.)?

Not yet implemented. It is intended to define different levels of processing (L0, L1, L2, L3 … ) and have an array with quality code. Some of the level (not necessarily all) will need to have a PID

d) Finalized “publishable” data?

Not yet decided.

6) Please indicate the kind of identifier system that are you using - e.g. Handle-based (EPIC or DOI), UUIDs or your own RI-specific system?

Not yet. Our plan is to use DOI for published data set and or own specific system for the description at the parameter level..

7) If you are using Handle-based PIDs, are these handles pointing to “landing pages”? If so, are these pages maintained by your RI or an external organization (like the data centre used for archiving)?

Not yet decided.

8) Are costs associated with PID allocation and maintenance (of landing pages etc.) specified in your RI’s operational cost budget?

Not yet adressed


CITATION[edit]

9) How does your “designated scientific community” (typical data users) primarily use your data products? As input for modelling, or for comparisons?

Both

10) Do your primary user community traditionally refer to datasets they use in publications:

a) By providing information about producer, year, report number if available, title or short description in the running text (e.g. under Materials and Methods)?

Yes in material and method, with appropriate reference and appropriate acknowledgement

b) By adding information about producer, year, report number if available, title or short description in the References section?

See previous

c) By DOIs, if available, in the References section?

Not widely yet, But could be used

d) By using other information?

No other known practices

e) By providing the data as supplementary information, either complete or via a link

Yes

11) Is it important to your data users to be able to refer to specific subsets of the data sets in their citation? Examples:

a) Date and time intervals

yes

b) Geographic selection

yes

c) Specific parameters or observables

yes

d) Other

Data quality, accuracy,

12) Is it important to be able to refer to many separate datasets in a collective way, e.g. having a collection of “all data” from your RI represented by one single DOI?

Yes at a site level or for an experiment that produced several datasets. Not necessarily to the whole RI

13) What strategy does your RI have for collecting information about the usage of your data products?

Not yet fully defined

It is expected to have a registration of users (account in the Information System), download tracking, identification in scientific publication, citation (DOI, publication/report transmission …)

a) Downloads/access requests

Yes, access requests

b) Visualization at your own data portal

Not yet defined

c) Visualization at other data portals

No.

d) References in scientific literature

Yes

e) References in non-scientific literature

Yes if easily collected

f) Scientific “impact”

To be defined

14) Who receives credit when a dataset from your RI is cited?

a) The RI itself

Yes

b) The RI’s institutional partners (all or in part, depending on the dataset contents)

Yes

c) Experts in the RI’s organization (named individuals)

no

d) “Principal investigators” in charge of measurements or data processing (named individuals)

yes.

e) Staff (scientists, research engineers etc.) performing the measurements or data processing (named individuals)

yes

15) What steps in tooling, automation and presentation do you consider necessary to improve take up of identification and citation facilities and to reduce the effort required for supporting those activities?

How to deal with incremental datasets?

How to link annotation on ontology and PID?

Formalities (who & when)[edit]

Go-between
??

Questionnaire response received by topic coordinator Maggie Hellström

RI representative
Christian Pichot and André Chanzy
Period of requirements collection
March 2016
Status
Information gathered, no analysis done yet