Methodology report for handling of data heterogeneity

From
Jump to: navigation, search

Abstract[edit]

D2.2 Methodology report for handling of data heterogeneity
ENVRIplus logo.jpg
Project ENVRIplus
Deliverable nr D2.2
Submission date 2017-05-02
Type Report

Link | PDF | Zenodo

Document metadata

The deliverable is related to the work developed in the task T2.2- Time-series heterogeneities: innovative user services (Task leader: INGV[EMSO], Participants: IFREMER[EURO-ARGO], UiT[ESONET-VI], UvA[LIFEWATCH], NERC[FixO3]) of Wp2.

The deliverable, after setting a common vocabulary and terminology, analyses the most recurrent sources of heterogeneity affecting the time-series in spite of the usual standardisation internal to Research Infrastructures and focus on some of them, namely data gaps and breaking points.

For the sake of a feasibility study on the application of methods for heterogeneities detection, time-series provided by Research Infrastructures have been classified in 'very-long time-series', lasting from one years to several years (typical of parameters related to global changes), and 'short time-series', lasting few seconds to some months (typical of parameter related to abrupt phenomena).

The methodologies used in the feasibility study on heterogeneities detection on time-series from Research Infrastructures have been borrowed by Geophysics. The computation of the Probability Density Function of the Power Spectral Density of the time-series is used for data gap detection and the computation of the ratio between the Short-Time Average and the Long-Time Average is shown as an example for breaking point (start time of heterogeneities) detection. A brief description of the methods is given together to basic references. The heterogeneity treatment issues are considered very dependent from the features of the corresponding parameters, site of measurement acquisition, modeling scale, and a trans-disciplinary approach to the various ENVRIplus time-series deserve a more deepen analysis.

The promising results obtained across disciplines and across domains support the proposal for the implementation of services to help scientists and data managers during the selection process of the most suitable data for their original elaborations. In shared virtual environments (i.e., cloud computing), the service can provide basic processing tools for time-series in different domains based on the proposed methodologies for heterogeneities detection. The service can be a very helpful option assisting data managers in the regular Quality Assessment/Quality Check procedures and support scientists in accepting/discarding /correcting data before the final data selection in view of original analytical elaborations.

⚠️ The full contents of this document have not yet been moved to the wiki. Please use the links to access the original document.
🙂 You can help by adding the contents to the wiki! See Help:Manual for more information on how to do this.