Editing Example 1: Using the Reference Model to Guide Research Activities (EISCAT 3D - EGI)

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 88: Line 88:
  
 
=== <span style="color: #BBCE00">Evaluation of the Feasibilities of the EGI Infrastructures and Services in Supporting EISCAT 3D Requirements</span> ===
 
=== <span style="color: #BBCE00">Evaluation of the Feasibilities of the EGI Infrastructures and Services in Supporting EISCAT 3D Requirements</span> ===
 
Using the common framework enabled by the Reference Model, we can analyse and compare the EGI and EUDAT generic service infrastructure and the requirements from a domain-specific data infrastructure such as EISCAT 3D, and we understand that there are significant gaps in-between, including but not limited to:
 
 
* Staging services to ship scientific data from observatory networks into the EGI generic service infrastructure (and to get the data off) are missing. Such a staging service should be able to transmit both big chunk of data (up to petabyte) and continuing updates/real-time data streams during operations. Such a service should satisfy performance requirements, including:
 
** Robust. Environmental scientific research needs high quality data. In particularly, during important natural events, losing observation data is unaffordable. Fault-tolerance is desirable, which requests the transmission service can be self-recover from the interruption point without restarting the whole transmission process.
 
** Fast, e.g., in the case of EISCAT 3D, the 10PB ring-buffer can only hold data for about 3 days, and the big observation data need to be transferred to the archive storage fast enough to avoid being overwritten.
 
** Cheap, e.g., the observatory networks are remote from the EGI computing farm. Using high-capacity pipes are possible but expensive. Software solutions such as, intelligent network protocols, optimisation, data compression, are desirable.
 
* Cost effective large storage facilities and long-term archiving mechanisms are urgently needed. Environmental data, in particular for climate research, need to be preserved over the long-term to be useful. Being Grid-oriented, EGI is not designed for data archiving purposes. Although large storage capabilities are potentially available through NGI participants, EGI does not guarantee long-term persistent data preservation. Curation services such as advanced data identification, cataloguing and replication are absent from the EGI service list.
 
* The EGI infrastructure needs to adapt in order to handle emerging big- data phenomena. The challenge is how to integrate what is new with what already exists. Services such as job schedulers need to be redesigned to take into account the trade-off of moving big data; intelligent data partitioning services should be investigated as a way to improve the performance of big data processing.
 
* Advanced searching and data discovery facilities are urgently needed. It is often said that data volume, velocity, and variety define big data, but the unique characteristic of big data is the manner in which the value is discovered <nowiki>[</nowiki>[[Bibliography#38|'''38''']]<nowiki>]</nowiki>. Unlike conventional analysis approaches where the simple summing of a known value reveals a result, big data analytics and the science behind them filter low value or low-density data to reveal high value or high-density data <nowiki>[</nowiki>[[Bibliography#38|'''38''']]<nowiki>]</nowiki>. Novel approaches are needed to discover meaningful insights through deep, complex search, e.g., using machine learning, statistical modelling, graph algorithms. Without facilities to unlock the value of big data, expensively generated and archived scientific data will be useless.
 
* Community support services are insufficient. The big data phenomena will eventually lead to a new data-centric way of conceptualising, organising and carrying out research activities that could lead to an introduction of new approach to conducting science. A new generation of data scientists is emerging with new requirements. Service facilities should be planned to support their needs. These together should enable the EISCAT 3D community to design new applications that are capable to work with big data, and can implement these on cutting-edge European Distributed Computing Infrastructures.
 
* Currently, EUDAT has taken up the role to implement a collaborative data infrastructure, however only a few services are available, storage facilities are insufficient, and policies for usage are unclear. Among our current investigations, we are investigating the possibility of integrating EUDAT services into EGI infrastructure, seen as a layer on top of the EGI federated computing facility. The analysis of the EUDAT services is included in [[Example 2: Using the Reference Model as an Analysis Tool (EUDAT)]] of the Reference Model.
 
 
== <span style="color: #BBCE00">Summary</span> ==
 
 
In this example, we have shown that the Reference Model could be used to conduct various system analysis tasks. Using the Reference Model we have:
 
 
* Clarified the boundary of EISCAT 3D data infrastructure and identified missing functionalities in the design;
 
* Provided a solution to integrate the EGI services into EISCAT 3D data infrastructure;
 
* Identified gaps between the EGI generic service infrastructure with the requirements from a domain specific research infrastructure, EISCAT 3D.
 
 
We have shown that the Reference Model offered a research infrastructure:
 
 
* A knowledge base containing useful information could be referred in various system analysis and design activities;
 
* A uniform platform into which computational elements of different infrastructures could be fitted, enabling comparison and analysis;
 
* A way of thinking of constructions of plausible system architectures.
 
  
 
[[Category:Appendix D Guidlines for using the Reference Model]]
 
[[Category:Appendix D Guidlines for using the Reference Model]]

Please note that all contributions to may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Copyrights for details). Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)