Editing CV Data Processing

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
 
The processing of data can be tightly integrated into data handling systems, or can be delegated to a separate set of services invoked on demand. In general, the more complicated processing tasks will require the use of separated services. The provision of dedicated processing services becomes significantly more important when large quantities of data are being curated within a research infrastructure. Scientific data is an example which is often subject to extensive post-processing and analysis in order to extract new results. The data processing objects of an infrastructure encapsulate the dedicated processing services made available to that infrastructure, either within the infrastructure itself or delegated to a client infrastructure.
 
The processing of data can be tightly integrated into data handling systems, or can be delegated to a separate set of services invoked on demand. In general, the more complicated processing tasks will require the use of separated services. The provision of dedicated processing services becomes significantly more important when large quantities of data are being curated within a research infrastructure. Scientific data is an example which is often subject to extensive post-processing and analysis in order to extract new results. The data processing objects of an infrastructure encapsulate the dedicated processing services made available to that infrastructure, either within the infrastructure itself or delegated to a client infrastructure.
  
{| class="wikitable" style="width: 85%"
+
{| class="wikitable"  
 
|-
 
|-
! style="padding: 10px;"| <div style='text-align: left;'>'''Data Processing Objects'''</div>
+
! style="padding: 10px"| <div style='text-align: left;'>'''Data Processing Objects'''</div>
 
|-
 
|-
| style="background-color:#ffffff;"| <center>[[File:CVDataProcessing.png]]</center>
+
| style="background-color:#ffffff;"| [[File:CVDataProcessing.png|600px]]
<div style='text-align: right;'>'''[[Notation of Computational Viewpoint Models#Computational Objects|Notation]]'''</div>
+
<div style='text-align: right;'>'''Notation'''</div>
 
|}
 
|}
  
Line 15: Line 15:
 
The internal staging of data within an infrastructure for processing requires coordination between data processing components (which handle the actual processing workflow) and data curation components (which hold data within the infrastructure). The diagram bellow displays these two groups of objects which integrate part of the processing subsystem.
 
The internal staging of data within an infrastructure for processing requires coordination between data processing components (which handle the actual processing workflow) and data curation components (which hold data within the infrastructure). The diagram bellow displays these two groups of objects which integrate part of the processing subsystem.
  
Data processing requests generally originate from [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] which validate requests by invoking an [[CV Service Objects#AAAI service|'''AAAI service''']]. The [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] will send a process request to a [[CV Service Objects#Coordination service|'''coordination service''']], which interprets the request and starts a processing workflow by invoking the required [[CV Component Objects#Process controller|'''process controller''']]. Data will be retrieved from the data store and passed to the execution platform, the [[CV Service Objects#Coordination service|'''coordination service''']] will request that a [[CV Service Objects#Data transfer service|'''data transfer service''']] to prepare a data transfer.
+
Data processing requests generally originate from [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] which validate requests by invoking an [[CV Service Objects#AAAI service|'''AAAI service''']]. The [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] will send a process request to a [[CV Service Objects#Coordination service|'''coordination service''']], which interprets the request and starts a processing workflow by invoking the required [[CV Component Objects#Process controller|'''process controller''']]. Data will be retrieved from the data store and passed to the execution platform, the [[CV Service Objects#Coordination service|'''coordination service''']] will request that a [[CV Service Objects#Data transfer service|'''data transfer service''']] to prepare a data transfer.
  
 
Data will be retrieved from the data store and passed to the execution platform, the [[CV Service Objects#Coordination service|'''coordination service''']] will request that a [[CV Service Objects#Data transfer service|'''data transfer service''']] to prepare a data transfer. The [[CV Service Objects#Data transfer service|'''data transfer service''']] will then configure and deploy a [[CV Component Objects#Data exporter|'''data exporter''']] which will handle the transfer of data between the storage and execution platforms, i.e. performing data staging. A data-flow is established between all required [[CV Component Objects#Data store controller|'''data store controllers''']] and [[CV Component Objects#Process controller|'''process controller''']] via the [[CV Component Objects#Data exporter|'''data exporter''']]. After the data-flow is established, processing starts. Processing can include a host of activities such as summarising, mining, charting, mapping, amongst many others. The details are left open to allow the modelling of any processing procedure. The expected output of the processing activities is a derived data product, which in turn will need to be persisted into the RIs data stores.
 
Data will be retrieved from the data store and passed to the execution platform, the [[CV Service Objects#Coordination service|'''coordination service''']] will request that a [[CV Service Objects#Data transfer service|'''data transfer service''']] to prepare a data transfer. The [[CV Service Objects#Data transfer service|'''data transfer service''']] will then configure and deploy a [[CV Component Objects#Data exporter|'''data exporter''']] which will handle the transfer of data between the storage and execution platforms, i.e. performing data staging. A data-flow is established between all required [[CV Component Objects#Data store controller|'''data store controllers''']] and [[CV Component Objects#Process controller|'''process controller''']] via the [[CV Component Objects#Data exporter|'''data exporter''']]. After the data-flow is established, processing starts. Processing can include a host of activities such as summarising, mining, charting, mapping, amongst many others. The details are left open to allow the modelling of any processing procedure. The expected output of the processing activities is a derived data product, which in turn will need to be persisted into the RIs data stores.
 
{| class="wikitable"
 
|-
 
! style="padding: 10px"| <div style='text-align: left;'>'''Data Processing Subsystem - data staging'''</div>
 
|-
 
| style="background-color:#ffffff;"| [[File:CVDataProcessing01.png|1000px]]
 
<div style='text-align: right;'>'''[[Notation of Computational Viewpoint Models#Computational Objects|Notation]]'''</div>
 
|}
 
 
== <span style="color: #BBCE00" id="DataPersistence">Data Persistence</span> ==
 
 
The persistence of derived data products produced after processing of data within an infrastructure also requires coordination between data processing components (which handle the actual processing workflow) and data curation components (which hold data within the infrastructure). The diagram bellow displays these two groups of objects which integrate part of the processing subsystem.
 
 
Data processing requests generally originate from [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] which validate requests by invoking an [[CV Service Objects#AAAI service|'''AAAI service''']]. The [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] can present results and ask the user if the results need to be stored, alternatively the user may configure the service to automatically store the resulting data. In either case, after processing, the [[CV Presentation Objects#Experiment laboratory|'''experiment laboratory''']] will send a process request to the [[CV Service Objects#Coordination service|'''coordination service''']], which interprets the request and invokes the [[CV Component Objects#Process controller|'''process controller''']] which will get the result data ready for transfer.
 
 
The [[CV Service Objects#Data transfer service|'''data transfer service''']] will then configure and deploy a [[CV Component Objects#Data importer|'''data importer''']] which will handle the transfer of data between the execution and storage platforms. A data-flow is established between [[CV Component Objects#Process controller|'''process controller''']] and [[CV Component Objects#Data store controller|'''data store controller''']] via the [[CV Component Objects#Data importer|'''data importer''']]. After the data-flow is established, the data transfer starts. The persistence if data will trigger various curation activities including data storage, backup, updating of catalogues, requiring identifiers and updating records. These activities can occurs automatically or just as signals sent out to warn human users that an action is expected.
 
 
{| class="wikitable"
 
|-
 
! style="padding: 10px"| <div style='text-align: left;'>'''Data Processing Subsystem - data persistence'''</div>
 
|-
 
| style="background-color:#ffffff;"| [[File:CVDataProcessing02.png|1000px]]
 
<div style='text-align: right;'>'''[[Notation of Computational Viewpoint Models#Computational Objects|Notation]]'''</div>
 
|}
 
  
 
[[Category:CV Objects and Subsystems]]
 
[[Category:CV Objects and Subsystems]]

Please note that all contributions to may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Copyrights for details). Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)