Info | ||
---|---|---|
| ||
Please don't use this Document for now, we are currently updating it. |
Metadata list | ||
---|---|---|
| ||
|| workpackage | WP6 | || task | 3 | || document number | 020 | || document version | 0.1 | || document title | Building the resource descriptor for your EPN-TAP service in DaCHS | || document type | TP | |
EPN2020-RI
EUROPLANET2020 Research Infrastructure
H2020-INFRAIA-2014-2015
Grant agreement no: 654208
Document: VESPA-
- Metadata from 1 workpackage
- Metadata from 1 task
- Metadata from 1 document number
-v Metadata from 1 document type
( Metadata from 1 document version
) Current page version
Metadata from | ||
---|---|---|
|
Date:
current-page-date |
---|
Start date of project: 01 September 2015
Duration: 48 Months
Responsible WP Leader: Stéphane Erard
Project co-funded by the European Union's Horizon 2020 research and innovation programme | ||
Dissemination level | ||
PU | Public | |
PP | Restricted to other programme participants (including the Commission Service) | |
RE | Restricted to a group specified by the consortium (including the Commission Services) | |
CO | Confidential, only for members of the consortium (excluding the Commission Services) |
Project Number | 654208 |
Project Title | EPN2020 - RI |
Project Duration | 48 months: 01 September 2015 – 30 August 2019 |
Pagebreak |
---|
Document Number |
| ||||||||||||||||
Delivery date |
| ||||||||||||||||
Title of Document |
| ||||||||||||||||
Contributing Work package (s) |
| ||||||||||||||||
Dissemination level | PU | ||||||||||||||||
Author (s) |
Abstract: |
Document history (to be deleted before submission to Commission) | ||||||||||||||
Date | Version | Editor | Change | Status | ||||||||||
| 0.0 | Initial version |
| |||||||||||
|
| Added sections "Service Metadata" and "EPNcore Table Definition" |
| |||||||||||
Table of Contents
Table of Contents |
---|
Pagebreak |
---|
Introduction
This document describes how to build your Resource Descriptor (RD) for an EPN-TAP service using DaCHS. The full documentation of DaCHS is available at http://docs.g-vo.org/DaCHS/ref.html, including a section "Anatomy of an RD" that describes the RD structure and syntax in details.
Overall Structure
The RD file is an XML file. The properties of the RD file can be either set up as XML children elements or as attributes of their parent property. The two following examples are equivalent, the first show the attribute syntax, while the second illustrates the XML element child option.
- Attribute syntax:
Code Block | ||
---|---|---|
| ||
<resource schema="my_service"> [...] </resource> |
- Child syntax
Code Block | ||
---|---|---|
| ||
<resource> <schema>my_service</schema> [...] </resource> |
The latter syntax is useful when the property has children properties.
Service Metadata
The first property of an RD is the list of service metadata. They are specified in a series of <meta>[...]</meta>
elements. The following <meta>
elements should be present in your file:
<meta> element name attribute | Content | Example | |||||
---|---|---|---|---|---|---|---|
title | The title of your resource. This is the title of your database. This should be rather explicit, basically, the meaning of the acronym or the short description of the service. |
| |||||
description | The description of you resource. This is long description. Put here anything that could be useful to understand the content or find the resource with full text search engines. Place yourself in the skin of your fellow scientists when writing this part. This must be understandable by non-specialist scientists |
| |||||
copyright | This contains the copyright, rules of use and acknowledgments related to the resource and the data served by the resource. Indicate here the distribution licence if there is one selected. Specify the "rules of use" or "rules of the road", or "data use policy"... You can also give acknowledgment policy and citation rules. |
| |||||
creationDate | The creation date of the resource descriptor (ISO-8601 formatted) |
| |||||
creator.name | The name of the creator of the resource (can be a person or an institute) |
| |||||
subject | There can be as many At least one of the top-level keywords of the UAT must be provided (See this page). The typical list of interest for VESPA is: |
| |||||
contact.email | The email address for questions and requests about the service. It is preferable to provide the users with an alias email that points to a one or few persons in your team. Having a real person email here may break the process if that person leaves your institute and you don't update the resource descriptor. |
| |||||
contact.name | The name of the person to contact for questions and requests about the service |
| |||||
contact.address | The real mail address of the institution or data center that distributes the resource. |
| |||||
referenceURL | An http URL that points to a description of the resource |
| |||||
facility | If you are serving observational data, you can give here the name of the observatory / spacecraft. Note that several names (including acronyms) could be provided in a #-separated list (see example). |
| |||||
instrument | If you are serving observational data, you can give here the name of the telescope / experiment / instrument. |
| |||||
source | This should be an ADS bibcode to a paper presenting the resource of the data present in the resource. |
| |||||
ContentLevel | In general, there are 4 elements of those, with the following values: "General", "University", "Research", "Amateur". You can restrict the list. |
|
EPNcore table definition
The EPNcore table should be defined in the RD using the epntap2
mixin. This ensures that your EPN-TAP service is compliant with the EPNcore specification. The epntap2
mixin will be updated as needed via the DaCHS debian package update. If you only plan to use the EPNcore mandatory parameters, your table definition section will be very simple:
Code Block | ||
---|---|---|
| ||
<table id="epn_core" onDisk="true" adql="True" primary="granule_uid"> <mixin spatial_frame_type="body">//epntap2#table-2_0</mixin> </table> |
This minimal table definition says:
- define an
epn_core
table - write it on disk (i.e., do not keep it in RAM)
- activate ADQL for query
- use the
granule_uid
column for the primary key of the table - use the
epntap2
template table with only mandatory parameters, and withspatial_frame_type = "body"
Info |
---|
The |
If you plan to use some optional parameters, as defined in the EPNcore specification, the table definition will look like:
Code Block | ||
---|---|---|
| ||
<table id="epn_core" onDisk="true" adql="True" primary="granule_uid"> <mixin spatial_frame_type="body" optional_columns="access_url access_format access_estsize thumbnail_url publisher bib_reference target_region feature_name" >//epntap2#table-2_0</mixin> </table> |
The extra optional_columns
attribute tells the template engine to set up those extra columns, as they are defined in the epntap2
mixin.
If you plan to use custom columns of your own, you have to define them in the table definition element, as shown in the following example:
Code Block | ||
---|---|---|
| ||
<table id="epn_core" onDisk="true" adql="True" primary="granule_uid"> <mixin spatial_frame_type="body" optional_columns="access_url access_format access_estsize thumbnail_url publisher bib_reference target_region feature_name" >//epntap2#table-2_0</mixin> <column name="receiver_name" type="text" ucd="meta.id" description="Receiver name used with the instrument." /> </table> |
The column
elements defines an extra column of the EPNcore table.
Data ingestion
The metadata ingestion is done through module called a Grammar in DaCHS jargon. Depending on the form of the metadata, different solutions are available. The Grammar module output is fed to a rowmaker module, which fills the table rows, with transformations if necessary.
- Preprocessed metadata available as a CSV file: The data provider is pre-processing his data collection to build a CSV file, containing the EPNcore metadata using the adequate units and conventions. In this case, the csvGrammar shall be used.
- Individual data files available from the DaCHS server as FITS files: The data provider is mounting a remote volume (e.g., through NFS) with the data files. If the data format is FITS, we can use the fitsProdGrammar to load the FITS files header.
- Individual data files available from the DaCHS server as CDF files: The data provider is mounting a remote volume (e.g., through NFS) with the data files. If the data format is CDF, we can use cdfHeaderGrammar to load the CDF global attributes.
- The metadata is available in an external SQL database: The data provider has access to an SQL database, containing the metadata (or data) he wants to load into his service. In this case, the odbcGrammar shall be used.
- If all previous cases don't apply: The data provider should use a customGrammar to load the metadata into DaCHS, through a dedicated python script.
We show below a simple example with CSV files available from the resource descriptor directory.
Code Block | ||
---|---|---|
| ||
<data id="import"> <!-- Define where to retrieve the data --> <sources> <!-- Pattern is used when there are multiple source files --> <!-- (here all the .csv files, and a data directory next to the q.rd file) --> <pattern>data/*.csv</pattern> </sources> <!-- we use the csvGrammar on the files defined in sources --> <csvGrammar/> <!-- now we send the data to the epn_core table --> <make table="epn_core"> <!-- Inserts the data of each row made by the grammar in its column --> <rowmaker idmaps="*"> <!-- idmaps="*" implies that any CSV columns with the same name as the epn_core column is mapped without processing --> <!-- Insert non-varying data --> <var key="target_name">"Mars"</var> <var key="service_title">"\schema"</var> [...] <!-- Bind the columns required by EPN-TAP --> <apply procDef="//epntap2#populate-2_0" name="fillepn"> <bind key="granule_uid">@granule_uid</bind> <bind key="granule_gid">@granule_gid</bind> <bind key="obs_id">@obs_id</bind> [...] </apply> </rowmaker> </make> </data> |
We list below a series of repositories using various grammar types: