Work in Progress
Please don't use this Document for now, we are currently updating it.
EPN2024-RI
EUROPLANET2024 Research Infrastructure
H2020-INFRAIA-2019-1
Grant agreement no: 871149
Document: VESPA - WP6-2-043- TN-v0.3(25)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
VESPA-Hub Repository Architecture
Date: $action.dateFormatter.formatGivenString("yyyy-MM-dd",$content.getLastModificationDate())
Start date of project: 01 February 2020
Duration: 48 Months
Responsible WP Leader: Stéphane Erard
Project co-funded by the European Union's Horizon 2024 research and innovation programme | ||
Dissemination level | ||
PU | Public | |
PP | Restricted to other programme participants (including the Commission Service) | |
RE | Restricted to a group specified by the consortium (including the Commission Services) | |
CO | Confidential, only for members of the consortium (excluding the Commission Services) |
Project Number | 871149 |
Project Title | EPN2024 - RI |
Project Duration | 48 months: 01 February 2020 – 31 January 2024 |
Document Number | WP6-task2-043-v0.3 |
Persistent Identifier | 10.25935/dgk9-g733 |
Issue Date | |
Title of Document | VESPA-Hub Repository Architecture |
Contributing Work package | WP6 |
Dissemination level | PU |
License | CC-BY-SA |
Author (s) |
Abstract: This document describes how the VESPA Hub git repositories are organised to store the VESPA servers and services configuration data. |
Document history (to be deleted before submission to Commission) | ||||
Date | Version | Editor | Change | Status |
| 0.1 | First draft | DRAFT | |
| 0.2 | Updated server path More details in service section | DRAFT | |
| 0.3 | Fixed service tree layout | DRAFT |
Introduction
In the course of the Europlanet-RI-2020 program (2015-2019), the VESPA EPNcore services were developed locally. Sustainability (server maintenance, service development, etc.) issues appeared for some teams. There is thus a need to provide support for VESPA providers to maintain their installations.
Use Cases
Use Case | Description |
---|---|
1. Internal Metadata Management | A publisher needs to understand the history of the metadata and data processing instructions, in particular which changes were made when. A repository also is a rich backup in case of failures. |
2. Sharing Metadata | There are two plausible scenarios for sharing metadata descriptions: 2a. A publisher wants to publish data rather similar to data already published by someone else and would like to re-use as much as possible; 2b. A publisher wants to take over or mirror a data collection published by someone else. In the mirror case, there is an additional requirement of keeping the two descriptions in synchronisation. |
3. Fallback System | When an existing service fails, a VESPA hub should step in and re-publish the data collection. |
Assessment
Use case (1) suggests the individual providers should have repositories of their own so their operations will not by default interfere with other providers' edits. Use cases (2) make it desirable that the full set of published RDs (in the DaCHS case) should still be open for discovery and inspection to all VESPA participants (to the widest extent possible). Also, in the mirroring case of use case (2b), providers need to be able to merge between different repositories.
Use case (3) (and the use case 2b) cannot fully be covered by a simple version control system. For a VESPA hub taking over, the hub would check out the RD and ancillary files, but they would still not have the data files. Due to the dramatically varying sizes of the data collections published through EPN-TAP (kilobytes to Terabytes), there probably is no one-size-fits-all approach.
For small-to-medium-sized data collections, we should investigate the use of dachs datapack together with bulk storage location; in this scenario, data providers would probably rsync datapacks to that location after major changes. When the fallback system has to kick in, the hub would retrieve and unpack the datapack, check out the RD in version control over the extract and then run the ingestion. Alternatively, there might be an extra dachs datapack mode that leaves existing files under version control alone. The online storage of data collections is also investigated in the VESPA-Cloud project.
In this document, with thus focus on providing a set of versioned repositories for servers' metadata and services' resource description.
VESPA Hubs Repositories
The VESPA assets configuration is managed through git repositories, hosted by Gitlab servers at the VESPA-Hubs. The repositories are organised with two project trees, one for the server configurations, the other for the services content and configuration. We make use of the project group feature of Gitlab, which allows to manage permissions and access with a directory-like hierarchical structure.
In the following section, the VESPA-Hub repository URL is referred to as gitlab_server_url
. Each VESPA-Hub uses the same architecture, but with different the servers and services contents.
The server managers have to set up projects groups with access rights according to the proposed architecture.
The two project trees are organised as follows:
- Server Repository Tree:
gitlab_server_url/vespa/<server_type>/servers/<institute>/<server_name>
- Service Repository Tree:
gitlab_server_url/vespa/
<server_type>/
services/
<institute>/<server_name>/<
service_name>
The proposed architecture is suitable for managing VESPA DaCHS server and their associated VESPA EPNcore services. The pieces of the repository tree elements are detailed below:
- At this point the layer '<
server_type>
' has only one possible value:dachs
. It will also be used for other servers and services (such as run-on-demand services, with UWS server like OPUS). - The '
<institute>
' should be an acknowledged name (by the provider or host, and by VESPA) of the hosting institute, or data centre (e.g., PADC, JacobsUni, Heidelberg, CDPP, LATMOS, IAS-Orsay, IWF-Graz, IAP-Prag...). This part of the tree may have several levels for various sub-groups, projects, if required. - The '
<server_name>
' is the domain name (or a short name) for the server (e.g., voparis-tap-planeto.obspm.fr or voparis-tap-planeto) - The '
<service_name>
' is the schema name in the case of VESPA EPN-TAP services (e.g., planets, bass2000...)
Authentication and Authorisation Infrastructure (AAI)
VESPA is managing its users and groups through the VESPA community AAI (Authentication and Authorization Infrastructure), which has been set up at eduTEAMS (an in-kind service offered by GEANT to Europlanet-2024-RI). The eduTEAMS groups are used to select the access rights to the individual repositories. A mapping between the eduTEAMS AAI groups and those of the local gitlab server is required.
EduTEAMS Access should be requested to the VESPA Helpdesk.
Server Repository Tree
The server repository tree is used to manage server configuration information. Each DaCHS server repository contains the default_meta.txt
and gavo.rc
files. The AWStats configuration is also stored in this repository. Deployment script are also included (work in progress).
The access authorisation to each repository are managed through eduTEAMS (see AAI section).
Service Repository Tree
The service repository tree is used to manage the service configuration information. Each DaCHS hosted service repository contains at least a resource descriptor file (usually named q.rd
) and a ReadMe.md
file. In addition, any further resources necessary to import the data (e.g., external scripts and grammars, mapping files) or to run custom services possibly present (e.g., external core definitions) must be checked into the repository.
More details are provided in the Individual Repository for VESPA Service Resource Descriptor in DaCHS page.
The access authorisation to each repository are managed through eduTEAMS (see AAI section).
Access to the ObsParis VESPA-Hub server
The ObsParis VESPA-Hub is hosting the main VESPA gitlab server, accessible at: https://voparis-gitlab.obspm.fr/vespa. Authenticated access to this server is enabled using eduTEAMS (see AAI section). Registered users shall sign in with eduTEAMS, which will redirect the users to a selection page, where they can select their preferred identity provider.
First connection
After the eduTEAMS invitation has been accepted and approved. The user shall connect a first time to https://voparis-gitlab.obspm.fr server, using eduTEAMS authentication. Then, contact again the VESPA Helpdesk to acknowledge that the access has been granted, and the VESPA-Hub team will add the user to the corresponding repositories and groups.