Methodology report for handling of data heterogeneity
Link | PDF | Zenodo
The deliverable is related to the work developed in the task T2.2- Time-series heterogeneities: innovative user services (Task leader: INGV[EMSO], Participants: IFREMER[EURO-ARGO], UiT[ESONET-VI], UvA[LIFEWATCH], NERC[FixO3]) of Wp2.
The deliverable, after setting a common vocabulary and terminology, analyses the most recurrent sources of heterogeneity affecting the time-series in spite of the usual standardisation internal to Research Infrastructures and focus on some of them, namely data gaps and breaking points.
For the sake of a feasibility study on the application of methods for heterogeneities detection, time-series provided by Research Infrastructures have been classified in 'very-long time-series', lasting from one years to several years (typical of parameters related to global changes), and 'short time-series', lasting few seconds to some months (typical of parameter related to abrupt phenomena).
The methodologies used in the feasibility study on heterogeneities detection on time-series from Research Infrastructures have been borrowed by Geophysics. The computation of the Probability Density Function of the Power Spectral Density of the time-series is used for data gap detection and the computation of the ratio between the Short-Time Average and the Long-Time Average is shown as an example for breaking point (start time of heterogeneities) detection. A brief description of the methods is given together to basic references. The heterogeneity treatment issues are considered very dependent from the features of the corresponding parameters, site of measurement acquisition, modeling scale, and a trans-disciplinary approach to the various ENVRIplus time-series deserve a more deepen analysis.
The promising results obtained across disciplines and across domains support the proposal for the implementation of services to help scientists and data managers during the selection process of the most suitable data for their original elaborations. In shared virtual environments (i.e., cloud computing), the service can provide basic processing tools for time-series in different domains based on the proposed methodologies for heterogeneities detection. The service can be a very helpful option assisting data managers in the regular Quality Assessment/Quality Check procedures and support scientists in accepting/discarding /correcting data before the final data selection in view of original analytical elaborations.
|Title||D2.2 Methodology report for handling of data heterogeneity|
|Work package||WORK PACKAGE 2 – Metrology, quality and harmonization|
|Authors||Laura Beranzoli (firstname.lastname@example.org) (INGV)|
|||Mariagrazia De Caro (Mariagrazia.email@example.com) (INGV)|
|||Caterina Montuori (firstname.lastname@example.org) (INGV)|
|||Vito Vitale (email@example.com) (CNR)|
|||Mauro Mazzola (firstname.lastname@example.org) (CNR)|
|||Boyan Petkov(email@example.com) (CNR)|
|||Herve Petetin (Herve.Petetin@aero.obs-mip.fr) (OBSERVATOIRE MIDI-PYRÉNÉES)|
|||Justin Buck (firstname.lastname@example.org) (NOCS)|
|||Catherine Lund Myhre (Cathrine.Lund.Myhre@nilu.no) (NILU)|
|Accepted by||Jean-Daniel Paris (WP 2 leader)|
|Deliverable due date||2017-04-28/M24|
|Actual Date of Submission||2017-05-02/M25|
|Project internal reviewer||Ari Asmi (University of Helsinki)|
|15.04.2017||Draft for comments|
|30.04.2017||Accepted by J.-D. Paris, V. Vitale, A. Asmi|