IV Information Objects

From
Revision as of 00:25, 27 March 2020 by ENVRIwiki (talk | contribs) (Information Object Definitions)
Jump to: navigation, search
The IV of the ENVRI RM defines two main types of information objects: Data and Metadata.

Information objects are used to model the various types of data and metadata manipulated by the RI. The IV information objects can be grouped as follows.

  • Data: research data processed by the RI:
    • Persistent data data
    • Unique identifiers for the data identification
    • Backup (of data)
  • Metadata: data typically related to the design of observation and measurement models, complements data by providing more precise details.
    • Design specification of the observation and measurement
    • Description of the measurement procedure
    • Quality Assurance (QA) annotations
    • Concepts from a conceptual model, e.g. an ontology
    • Mapping rules which are used for the model-to-model transformations
    • Provenance records
    • Management metadata (The data used to identify the states of data and metadata objects)
  • Information Object Definitions
    • backup
    • mapping rule
    • citation
    • concept
    • conceptual model
    • data
    • data provenance
    • measurement result
    • metadata
    • metadata catalogue
    • metadata state
    • persistent data
    • persistent data state
    • qa notation
    • specification of investigation design
    • specification of measurements or observations
    • unique identifier (UID)
Information Object Types
IVObjectTypes.png
Notation

Information Object Definitions

backup

A copy of (persistent) data so it may be used to restore the original after a data loss event.

mapping rule

Configuration directives used for model-to-model transformation.

Mapping rules can be transformation rules for:

  • arithmetic values (mapping from one unit to another)
    from linear functions like k.x + d to multivariate functions
  • ordinal and nominal values
    e.g. transforming classifications according to a classification system A to classification system B
  • data descriptions (metadata or Semantic Annotation or QA annotation)
  • parameter names and descriptions (can be n:m)
  • method names and descriptions
  • sampling descriptions

citation

A published, resolvable, token linking to a persistent data object via an identifier.

In information technology terms, a citation is a reference to published data which may include the information related to:

  • the data source(s)
  • the owner(s) of the data source(s)
  • a description of the evaluation process, if available
  • a timestamp marking the access time to the data sources, thus reflecting a certain version
  • the equipment used for collecting the data (individual sensor or sensor network)

It is important that the citation is resolvable, which means that the identifiers point to live data sets and that the meaning of the items above are made clear.

concept

Identifier, name and definition of the meaning of a thing (abstract or real thing). Human readable definition by sentences, machine readable definition by relations to other concepts (machine readable sentences). It can also be meant for the smallest entity of a conceptual model. It can be part of a flat list of concepts, a hierarchical list of concepts, a hierarchical thesaurus or an ontology.

conceptual model

A collection of concepts, their attributes and their relations. It can be unstructured or structured (e.g. glossary, thesaurus, ontology). Usually the description of a concept and/or a relation defines the concept in a human readable form. Conceptual models can also be represented in machine readable formats, for instance RDFS or OWL. Those sentences can be used to construct a self description. It is common practice to provide both the human readable description and the machine readable description within the same system. In this sense, a conceptual model can also be seen as a collection of human and machine readable sentences. They can be local, developed within a project, or global, accepted and used by a wider community (such as GEMET or OBOE). Conceptual models can be used to annotate data (e.g. within a network of triple stores).

data

Research data processed by the RI. This is the base information object class from which all other information objects are derived

data provenance

Metadata that traces the origins of data and records all state changes of data during their lifecycle and their movements between storages.

A creation of an entry into the data provenance records triggered by any actions typically contains:

  • date/time of action;
  • actor;
  • type of action;
  • data identification.

Data provenance system is an annotation system for managing data provenances. Usually unique identifiers are used to refer the data in their different states and for the description of the different states.

measurement result

Quantitative, qualitative, or cataloguing determinations of magnitude, dimension, and uncertainty to the outputs of observation instruments, sensors, sensor networks, human observers and observer networks.

metadata

Data about data, in scientific applications is used to describe, explain, locate, or make it easier to retrieve, use, or manage a data resource.

There have been numerous attempts to classify the various types of metadata. As one example, NISO (National Information Standards Organisation) distinguishes between three types of metadata based on their functionality: Descriptive metadata, which describes a resource for purposes, such as discovery and identification; Structural metadata, which indicates how compound objects are put together; and Administrative metadata, which provides information to help manage a resource. But this is not restrictive. Different applications may have different ways to classify their own metadata.

Metadata is generally encoded in a metadata schema which defines a set of metadata elements and the rules governing the use of metadata elements to describe a resource. The characteristics of metadata schema normally include: the number of elements, the name of each element, and the meaning of each element. The definition or meaning of the elements is the semantics of the schema, typically the descriptions of the location, physical attributes, type (i.e., text or image, map or model), and form (i.e., print copy, electronic file). The value of each metadata element is the content. Sometimes there are content rules and syntax rules. The content rules specify how content should be formulated, representation constraints for content, allowable content values and so on. And the syntax rules specify how the elements and their content should be encoded. Some popular syntaxes used in scientific applications include:

Such syntax encoding allows the metadata to be processed by a computer program.

Many standards for representing scientific metadata have been developed within disciplines, sub-disciplines or individual project or experiments. Some widely used scientific metadata standards include:

Two aspects of metadata give rise to the complexity in management:

  • Metadata are data, and data become metadata when they are used to describe other data. The transition happens under particular circumstances, for particular purposes, and with certain perspectives, as no data are always metadata. The set of circumstances, purposes, or perspectives for which some data are used as metadata is called the ‘context’. So metadata are data about data in some ‘context’.
  • Metadata can be layered. This happens because data objects may move to different stages during their life in a digital environment requiring their association to different layers of metadata at each stage.

Metadata can be fused with the data. However, in many applications, such as a provenance system or a distributed satellite image annotation system, the metadata and data are often created and stored separately, as they may be generated by different users, in different computing processes, stored at different locations and in different types of storage. Often, there is more than one set of metadata related to a single data resource, e.g. when the existing metadata becomes insufficient, users may design new templates to make another metadata collection. Efficient software and tools are required to facilitate the management of the linkage between metadata and data. Such linkage relationship between metadata and data are vulnerable to failures in the processes that create and maintain them, and to failures in the systems that store their representations. It is important to devise methods that reduce these failures.

metadata catalogue

A collection of metadata, usually established to make the metadata available to a community. A metadata catalogue can be exposed through an access service.

metadata state

metadata state is an object property that determines the set of all sequences of actions (or traces) in which the metadata object can participate, at a given instant in time (as defined in ODP, ISO/IEC 10746-2).

In their lifecycle, metadata may have the states described in the following table.

State Description
raw metadata which are not yet registered or organised in a catalogue. Raw metadata are not shareable in this status.
registered metadata which have been stored in a metadata catalogue.
annotated metadata that are associated to concepts, describing their meaning
published metadata made available to the public, the outside world. Metadata registered within public catalogues.

persistent data

Data is the representations of information dealt with by information systems and users thereof (as defined in ODP, ISO/IEC 10746-2). Persistent Data denotes data that are persisted (stored for the long-term).