Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Form FieldContentNotes
VESPA-Cloud
  • Baptiste Cecconi, LESIA, Observatoire de Paris, France
  • Pierre Le Sidaner, DIO, Observatoire de Paris, France
  • Stéphane Erard, LESIA, Observatoire de Paris, France
  • Angelo Pio Rossi, Jacobs Uni, Bremen, Germany
  • Markus Demleitner, Heidelberg Uni, Germany
  • Marco Molinaro, OATF-INAF, Trieste, Italy
Provide full names with affiliations
baptiste.cecconi@obspm.fr

VESPA (Virtual European Solar and Planetary Access) is a network of consistent data services covering all fields of solar system sciences. It is a mature project, developed within EUROPLANET-2020-RI, a project funded through H2020 (ended in Aug 2019), and will be further supported under the EUROPLANET-2024-RI project (starting in Feb. 2020). The data providers are using a standard API (based on the Table Access Protocol of IVOA and EPNcore, a common dictionary of metadata). The VESPA services consist in searchable metadata tables, with links to science data products (files, webservices...). VESPA provides a unified data discovery service for solar system sciences.  

The hosting and maintenance of VESPA provider's servers has proved to be a single point failure for small teams with little IT support. The project with EOSC would greatly facilitate the data sharing from small teams or teams, whose institutions have restrictive firewall policies (like labs hosted in space agencies, e.g., DLR in Germany).  

Current plan for this project.

Implementation of VESPA provider's servers on EOSC-Hub cloud VM instances, as well as a test VESPA portal instance.

Next level for future near developments 

  • New VESPA portal architecture, based on lucene-like technologies. This would greatly enhance the portal search interface and allow VESPA to be interoperable with NASA/PDS4 (Planetary Data Archive) Search engine.
  • Access to VESPA network through community based python scripts (astropy, pyvo...) with JupyterLab facility.  
  • On-demand computing services (models, cutouts, resampling...) 
Please write a brief and plain explanation of the proposed project and the scientific or technical challenge it will address. Please specify how the research community will benefit from this project.

Each VESPA provider is hosting and maintaining a server (physical or virtualized) with the same software distribution (https://dachs-doc.readthedocs.io), which implements the interoperability layers (from IVOA and VESPA) and following FAIR principles. Each server hosts a table of standardized metadata with URLs to data files or data services. Data files can be hosted by the VESPA provider team, or in an external archive (e.g., ESA/PSA - Planetary Science Archive). 

The VESPA search portal is developed and maintained at Observatoire de Paris (Paris, France).

A small prototype is already running using virtual machines deployed at catania cc. The DaCHS framework is installed and the astronomy APIs are reachable. Fake VESPA services are installed and can be queried. 

Please specify what kind of technology you are using (Grids, Cloud computing, HPC, HTC, data storage, data repositories, data management systems, data discovery services, etc.) and the related Technology Readiness Level (TRL) (please, provide evidence of the declared TRL with links to technical documentation, papers, etc)

We propose to use the EOSC infrastructure to host VESPA provider's servers (through a controlled deployment environment with containers, e.g., docker).

The VESPA providers would be able to:

  • order a VM with all the server framework installed,
  • configure the server for their science application,
  • co-administrate the server packages with the VESPA team.
  • update the content and the tables

In a second phase, we will implement an EOSC-Hub hosted VESPA portal, using the web interface developed at Observatoire de Paris.

Please explain what services and resources you are planning to use and integrate and the benefits that you are expecting from the new developed solution.
1.3 Physical sciences (Astronomy)Please state the area of science of the project including one or more codes from the OECD Frascati Fields of Science most appropriate to the project, the codes can be consulted at https://www.oecd.org/science/inno/38235147.pdf

VESPA is a distributed (but not redundant) data discovery and access framework. Hosting VESPA services in the cloud ensures their availability, and thus the reliability of the full VESPA network.

VESPA-cloud enhances the science return for the solar and planetary science field.

VESPA-cloud is simple proof of concept, which will demonstrate the use and the efficiency of the EOSC-Hub infrastructure and services to the solar system science community.  


The goal of VESPA is to make data Solar System Findable and Accessible through an interoperable interfaces, and is recommending standard data and metadata formats, ensuring reusability.

All software developed for VESPA are open-source (mostly GPLv3).

VESPA-Cloud enhances the accessibility, by providing a sustainable access for VESPA dataset.  


12 months for setting up services and finding a sustainable approach for further operations.   

The minimal individual VESPA provider instance is: 2 CPU, 4GB RAM, 20 GB disk. 

We estimate to have about 20 instances to deploy in the first stage. Each instances must have a fixed and public IP address (customizable DNS names preferred). The instances are expected to be up and running all the time. Short unavailability of the services are acceptable, if the instance can be relaunched automatically.   

Storage Buffer for data ingestion. This is a global temporary storage volume, which can be mounted on any VESPA provider's instance for metadata extraction and ingestion. Data are pushed onto this volume for initial metadata extraction and ingestion. This storage size should be 10TB. For providers needing a permanent storage capability, we will investigate with EUDAT or other EOSC partners. 

For the current version of the VESPA portal, the compute and storage specifications are the same. 

Please specify the minimum amount of resources needed to execute services involved in your project in terms of CPU cores, RAM, storage for data analysis

With the new Europlanet-2024-RI program, we can expect at least 5 new providers per year, each with the same compute and storage needs.

Among the further developments foreseen after the completion of the pilot program:

  • Next level data portal (lucent-like). 
  • Access to VESPA network through community based python scripts (astropy, pyvo...) with JupyterLab facility.  
  • On-demand computing services (models, cutouts, resampling...) 

However, they are not yet not completely defined and sized. 

Please specify the amount of resources in terms of CPU cores, RAM, storage for data analysis needed to scale-up your Project after the completion of the pilot

Not applicable, see below.

Please describe your data sets in volume and granularity and specify the minimum amount of archive storage as well as the access requirements for long-term archiving
The project does not include long term data preservation, since we only serve up-to-date metadata tables giving access to science ready data. The provider's teams are in charge of their own data preservation, outside the VESPA network project. Please specify the amount of storage for long-term archiving and corresponding data policies
No sensitive data. Is the data to be handled Regulated data, Privacy-sensitive etc.?


...