Work in Progress

Please don't use this Document for now, we are currently updating it.


EPN2024-RI


EUROPLANET2024 Research Infrastructure 

H2020-INFRAIA-2019-1  

Grant agreement no: 871149


Document: VESPA - WP6-2-043- TN-v0.3(25)


doi:10.25935/dgk9-g733


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License






VESPA-Hub Repository Architecture




Date: $action.dateFormatter.formatGivenString("yyyy-MM-dd",$content.getLastModificationDate())


Start date of project: 01 February  2020

 Duration: 48 Months

Responsible WP Leader: Stéphane Erard


Project co-funded by the European Union's Horizon 2024 research and innovation programme

Dissemination level

PU

Public

  •  

PP

Restricted to other programme participants (including the Commission Service)

  •  

RE

Restricted to a group specified by the consortium (including the Commission Services)

  •  

CO

Confidential, only for members of the consortium (excluding the Commission Services)

  •  

Project Number

871149

Project Title

EPN2024 - RI

Project Duration

48 months: 01 February 2020 – 31 January 2024

Document Number

WP6-task2-043-v0.3

Persistent Identifier

10.25935/dgk9-g733

Issue Date

Title of Document

VESPA-Hub Repository Architecture

Contributing Work package 

WP6

Dissemination level

PU

License

CC-BY-SA

Author (s)

Abstract: This document describes how the VESPA Hub git repositories are organised to store the VESPA servers and services configuration data.

Document history (to be deleted before submission to Commission)

Date

Version

Editor

Change

Status

 

0.1

First draft

DRAFT

 

0.2

Updated server path
More details in service section

DRAFT

 

0.3

Fixed service tree layout
Fixed Abstract
Added Use case section

DRAFT


Introduction

In the course of the Europlanet-RI-2020 program (2015-2019), the VESPA EPNcore services were developed locally. Sustainability (server maintenance, service development, etc.) issues appeared for some teams. There is thus a need to provide support for VESPA providers to maintain their installations. 

Use Cases

Use CaseDescription
1. Internal Metadata ManagementA publisher needs to understand the history of the metadata and data processing instructions, in particular which changes were made when.  A repository also is a rich backup in case of failures.
2. Sharing Metadata

There are two plausible scenarios for sharing metadata descriptions:

2a. A publisher wants to publish data rather similar to data already published by someone else and would like to re-use as much as possible;

2b. A publisher wants to take over or mirror a data collection published by someone else.  

In the mirror case, there is an additional requirement of keeping the two descriptions in synchronisation.

3. Fallback SystemWhen an existing service fails, a VESPA hub should step in and re-publish the data collection.

Assessment

Use case (1) suggests the individual providers should have repositories of their own so their operations will not by default interfere with other providers' edits.  Use cases (2) make it desirable that the full set of published RDs (in the DaCHS case) should still be open for discovery and inspection to all VESPA participants (to the widest extent possible).  Also, in the mirroring case of use case (2b), providers need to be able to merge between different repositories.

Use case (3) (and the use case 2b) cannot fully be covered by a simple version control system.  For a VESPA hub taking over, the hub would check out the RD and ancillary files, but they would still not have the data files.  Due to the dramatically varying sizes of the data collections published through EPN-TAP (kilobytes to Terabytes), there probably is no one-size-fits-all approach.

For small-to-medium-sized data collections, we should investigate the use of dachs datapack together with bulk storage location; in this scenario, data providers would probably rsync datapacks to that location after major changes.  When the fallback system has to kick in, the hub would retrieve and unpack the datapack, check out the RD in version control over the extract and then run the ingestion. Alternatively, there might be an extra dachs datapack mode that leaves existing files under version control alone. The online storage of data collections is also investigated in the VESPA-Cloud project.

In this document, with thus focus on providing a set of versioned repositories for servers' metadata and services' resource description.  

VESPA Hubs Repositories

The VESPA assets configuration is managed through git repositories, hosted by Gitlab servers at the VESPA-Hubs. The repositories are organised with two project trees, one for the server configurations, the other for the services content and configuration. We make use of the project group feature of Gitlab, which allows to manage permissions and access with a directory-like hierarchical structure. 

In the following section, the VESPA-Hub repository URL is referred to as gitlab_server_url. Each VESPA-Hub uses the same architecture, but with different the servers and services contents.  

The server managers have to set up projects groups with access rights according to the proposed architecture.  

The two project trees are organised as follows:

  • Server Repository Tree: gitlab_server_url/vespa/<server_type>/servers/<institute>/<server_name>
  • Service Repository Tree: gitlab_server_url/vespa/<server_type>/services/<institute>/<server_name>/<service_name>

The proposed architecture is suitable for managing VESPA DaCHS server and their associated VESPA EPNcore services. The pieces of the repository tree elements are detailed below:

  • At this point the layer '<server_type>' has only one possible value: dachs. It will also be used for other servers and services (such as run-on-demand services, with UWS server like OPUS).
  • The '<institute>' should be an acknowledged name (by the provider or host, and by VESPA) of the hosting institute, or data centre (e.g., PADC, JacobsUni, Heidelberg, CDPP, LATMOS, IAS-OrsayIWF-Graz, IAP-Prag...). This part of the tree may have several levels for various sub-groups, projects, if required.
  • The '<server_name>' is the domain name (or a short name) for the server (e.g., voparis-tap-planeto.obspm.fr or voparis-tap-planeto)
  • The '<service_name>' is the schema name in the case of VESPA EPN-TAP services (e.g., planets, bass2000...)

Authentication and Authorisation Infrastructure (AAI)

VESPA is managing its users and groups through the VESPA community AAI (Authentication and Authorization Infrastructure), which has been set up at eduTEAMS (an in-kind service offered by GEANT to Europlanet-2024-RI). The eduTEAMS groups are used to select the access rights to the individual repositories. A mapping between the eduTEAMS AAI groups and those of the local gitlab server is required.

EduTEAMS Access should be requested to the VESPA Helpdesk. 

Server Repository Tree

The server repository tree is used to manage server configuration information. Each DaCHS server repository contains the default_meta.txt and gavo.rc files. The AWStats configuration is also stored in this repository. Deployment script are also included (work in progress). 

The access authorisation to each repository are managed through eduTEAMS (see AAI section).  

Service Repository Tree 

The service repository tree is used to manage the service configuration information. Each DaCHS hosted service repository contains at least a resource descriptor file (usually named q.rd) and a ReadMe.md file. In addition, any further resources necessary to import the data (e.g., external scripts and grammars, mapping files) or to run custom services possibly present (e.g., external core definitions) must be checked into the repository.

More details are provided in the Individual Repository for VESPA Service Resource Descriptor in DaCHS page. 

The access authorisation to each repository are managed through eduTEAMS (see AAI section).  

Access to the ObsParis VESPA-Hub server

The ObsParis VESPA-Hub is hosting the main VESPA gitlab server, accessible at: https://voparis-gitlab.obspm.fr/vespa. Authenticated access to this server is enabled using eduTEAMS (see AAI section). Registered users shall sign in with eduTEAMS, which will redirect the users to a selection page, where they can select their preferred identity provider.

First connection  

After the eduTEAMS invitation has been accepted and approved. The user shall connect a first time to https://voparis-gitlab.obspm.fr server, using eduTEAMS authentication. Then, contact again the VESPA Helpdesk to acknowledge that the access has been granted, and the VESPA-Hub team will add the user to the corresponding repositories and groups

  • No labels