Please don't use this Document for now, we are currently updating it.
|| workpackage | WP6 | || task | 3 | || document number | 020 | || document version | 0.1 | || document title | Building the resource descriptor for your EPN-TAP service in DaCHS | || document type | TP |
EUROPLANET2020 Research Infrastructure
Grant agreement no: 654208
Metadata from 1 workpackage
Metadata from 1 task
Metadata from 1 document number
Metadata from 1 document type
Metadata from 1 document version
Current page version
Start date of project: 01 September 2015
Duration: 48 Months
Responsible WP Leader: Stéphane Erard
Project co-funded by the European Union's Horizon 2020 research and innovation programme
Restricted to other programme participants (including the Commission Service)
Restricted to a group specified by the consortium (including the Commission Services)
Confidential, only for members of the consortium (excluding the Commission Services)
EPN2020 - RI
48 months: 01 September 2015 – 30 August 2019
Title of Document
Contributing Work package (s)
Document history (to be deleted before submission to Commission)
Added sections "Service Metadata" and "EPNcore Table Definition"
Table of Contents
|Table of Contents|
This document describes how to build your Resource Descriptor (RD) for an EPN-TAP service using DaCHS. The full documentation of DaCHS is available at http://docs.g-vo.org/DaCHS/ref.html, including a section "Anatomy of an RD" that describes the RD structure and syntax in details.
The RD file is an XML file. The properties of the RD file can be either set up as XML children elements or as attributes of their parent property. The two following examples are equivalent, the first show the attribute syntax, while the second illustrates the XML element child option.
- Attribute syntax:
<resource schema="my_service"> [...] </resource>
- Child syntax
<resource> <schema>my_service</schema> [...] </resource>
The latter syntax is useful when the property has children properties.
The first property of an RD is the list of service metadata. They are specified in a series of
<meta>[...]</meta> elements. The following
<meta> elements should be present in your file:
|<meta> element name attribute||Content||Example|
|title||The title of your resource. This is the title of your database. This should be rather explicit, basically, the meaning of the acronym or the short description of the service.|
The description of you resource. This is long description. Put here anything that could be useful to understand the content or find the resource with full text search engines. Place yourself in the skin of your fellow scientists when writing this part. This must be understandable by non-specialist scientists
|copyright||This contains the copyright, rules of use and acknowledgments related to the resource and the data served by the resource. Indicate here the distribution licence if there is one selected. Specify the "rules of use" or "rules of the road", or "data use policy"... You can also give acknowledgment policy and citation rules.|
The creation date of the resource descriptor (ISO-8601 formatted)
|creator.name||The name of the creator of the resource (can be a person or an institute)|
There can be as many
At least one of the top-level keywords of the UAT must be provided (See this page). The typical list of interest for VESPA is:
|contact.email||The email address for questions and requests about the service. It is preferable to provide the users with an alias email that points to a one or few persons in your team. Having a real person email here may break the process if that person leaves your institute and you don't update the resource descriptor.|
|contact.name||The name of the person to contact for questions and requests about the service|
|contact.address||The real mail address of the institution or data center that distributes the resource.|
|referenceURL||An http URL that points to a description of the resource|
|facility||If you are serving observational data, you can give here the name of the observatory / spacecraft. Note that several names (including acronyms) could be provided in a #-separated list (see example).|
|If you are serving observational data, you can give here the name of the telescope / experiment / instrument.|
|source||This should be an ADS bibcode to a paper presenting the resource of the data present in the resource.|
|ContentLevel||In general, there are 4 elements of those, with the following values: "General", "University", "Research", "Amateur". You can restrict the list.|
EPNcore table definition
The EPNcore table should be defined in the RD using the
epntap2 mixin. This ensures that your EPN-TAP service is compliant with the EPNcore specification. The
epntap2 mixin will be updated as needed via the DaCHS debian package update. If you only plan to use the EPNcore mandatory parameters, your table definition section will be very simple:
<table id="epn_core" onDisk="true" adql="True" primary="granule_uid"> <mixin spatial_frame_type="body">//epntap2#table-2_0</mixin> </table>
This minimal table definition says:
- define an
- write it on disk (i.e., do not keep it in RAM)
- activate ADQL for query
- use the
granule_uidcolumn for the primary key of the table
- use the
epntap2template table with only mandatory parameters, and with
spatial_frame_type = "body"
If you plan to use some optional parameters, as defined in the EPNcore specification, the table definition will look like:
<table id="epn_core" onDisk="true" adql="True" primary="granule_uid"> <mixin spatial_frame_type="body" optional_columns="access_url access_format access_estsize thumbnail_url publisher bib_reference target_region feature_name" >//epntap2#table-2_0</mixin> </table>
optional_columns attribute tells the template engine to set up those extra columns, as they are defined in the
If you plan to use custom columns of your own, you have to define them in the table definition element, as shown in the following example:
<table id="epn_core" onDisk="true" adql="True" primary="granule_uid"> <mixin spatial_frame_type="body" optional_columns="access_url access_format access_estsize thumbnail_url publisher bib_reference target_region feature_name" >//epntap2#table-2_0</mixin> <column name="receiver_name" type="text" ucd="meta.id" description="Receiver name used with the instrument." /> </table>
column elements defines an extra column of the EPNcore table.
The metadata ingestion is done through module called a Grammar in DaCHS jargon. Depending on the form of the metadata, different solutions are available. The Grammar module output is fed to a rowmaker module, which fills the table rows, with transformations if necessary.
- Preprocessed metadata available as a CSV file: The data provider is pre-processing his data collection to build a CSV file, containing the EPNcore metadata using the adequate units and conventions. In this case, the csvGrammar shall be used.
- Individual data files available from the DaCHS server as FITS files: The data provider is mounting a remote volume (e.g., through NFS) with the data files. If the data format is FITS, we can use the fitsProdGrammar to load the FITS files header.
- Individual data files available from the DaCHS server as CDF files: The data provider is mounting a remote volume (e.g., through NFS) with the data files. If the data format is CDF, we can use cdfHeaderGrammar to load the CDF global attributes.
- The metadata is available in an external SQL database: The data provider has access to an SQL database, containing the metadata (or data) he wants to load into his service. In this case, the odbcGrammar shall be used.
- If all previous cases don't apply: The data provider should use a customGrammar to load the metadata into DaCHS, through a dedicated python script.
We show below a simple example with CSV files available from the resource descriptor directory.
<data id="import"> <!-- Define where to retrieve the data --> <sources> <!-- Pattern is used when there are multiple source files --> <!-- (here all the .csv files, and a data directory next to the q.rd file) --> <pattern>data/*.csv</pattern> </sources> <!-- we use the csvGrammar on the files defined in sources --> <csvGrammar/> <!-- now we send the data to the epn_core table --> <make table="epn_core"> <!-- Inserts the data of each row made by the grammar in its column --> <rowmaker idmaps="*"> <!-- idmaps="*" implies that any CSV columns with the same name as the epn_core column is mapped without processing --> <!-- Insert non-varying data --> <var key="target_name">"Mars"</var> <var key="service_title">"\schema"</var> [...] <!-- Bind the columns required by EPN-TAP --> <apply procDef="//epntap2#populate-2_0" name="fillepn"> <bind key="granule_uid">@granule_uid</bind> <bind key="granule_gid">@granule_gid</bind> <bind key="obs_id">@obs_id</bind> [...] </apply> </rowmaker> </make> </data>
We list below a series of repositories using various grammar types: