Article under developpement, not already finished and reviewed do not take it into account for now

Introduction

Setting-up an EPN-TAP service means publishing a table in a DACHS server describing data we want to display. This table is referencing each file with a maximum amount of information by filling parameters defined by the standards of EPN-TAP called epn_core view. The EPN-TAP standards have mandatory parameters (most of them could be left empty) and optional parameters, finally additional columns could be defined to provide specific information. VESPA is a way to interrogate registered services published in different DaCHS servers using keywords or ADQL queries.

This tutorial will provide a very simple example to help beginners in understanding how they can set up an EPN-TAP service. In this purpose, we will set up a service called "Planets" providing information on the solar system planets. It will only document the method using a CSV (Comma Separated Values) file to fill the table.

For this example, we want to display these data on our service :

namemean radius (km)

mean radius uncertainty

(km)

equatorial radius

(km)

equatorial radius

uncertainty

(km)

polar radius

(km)

polar

radius

uncertainty

(km)

rms

deviation

(km)

elevation

max

(km)

elevation

min

(km)

mass

(kg)

distance to

primary

(km)


sideral

rotation

period

(h)

Mercury2439.71.02439.71.02439.71.014.62.53.3014E2357909227.1407.504
Venus6051.81.06051.81.06051.81.011124.86732E24108209475.-5832.432
Earth6371.000.016378.140.016356.750.013.578.8511.525.97219E24149598262.23.93447232
Mars3389.50.23396.190.13376.0.13.022.647.556.41693E23227943824.24.624
Jupiter69911.671492.466854.1062.1311021.89813E27778340821.9.92496
Saturn58232.660268.454364.10102.982055.68319E261426666422.10.656
Uranus25362.725559.424973.2016.82808.68103E252870658186.-17.232
Neptune24622.1924764.1524341.3081401.0241E264498396441.16.104

Steps to follow

First, a virtual machine hosting a DaCHS server must be settled up. For simple services like this one, metadata must be referenced in a CSV file containing useful features, it is also possible to use a Python script to generate metadata. Then, a Resource Descriptor (RD) must be written in XML. This RD aims to define columns, read the CSV file and fill a table on DaCHS server. Finally, the service must be registered on VESPA portal.


First, it is necessary to have settled-up an EPN-TAP server hosting the DaCHS server. Method described in this tutorial: EPN-TAP Server Installation for VESPA Data Provider Tutorial#TAPServerInstallationforVESPADataProviderTutorial-6.AWStatsInstallationandConfigurationincludingApache .


To publish your own service, you have to define granules. Each granule can link to a file (with the optional parameter access_url). A granule must have one unique identifier which is a primary key for the table (the mandatory parameter for this identifier is granule_uid). Each different type of granule must have an identifier (granule_gid parameter).

In our example, each granule has the same type and corresponds to one planet. So, basically the granule_gid and granule_uid will be respectively "Planet" and the name of the planet. There is no need to define an access_url because there is no file linked for this example.


Once granules are defined, you have to browse the EPN-TAP V2 parameters and their description to see what could be informed. In order to clearly define organization of the final table, it is advised to fill a scheme with the EPN-TAP parameters on one side and what you decide to put into each on the other side (available here in xls format). A large part of the mandatory parameters could be left empty, other parameters could be defined as additional-columns.


Metadatas which are varying from one granule to the other or cannot be post-treated in the resource descriptor must be referenced in a CSV file. The first row has to be the column's names, then each row will set the values for a granule. This CSV file will further be read by the RD to fill the table in  DaCHS. You can use your favourite programming language to create this CSV file.

For this example, we give the link to a hand-written CSV,  you can download it and the associated Resource Descriptor on github:

https://github.com/epn-vespa/DaCHS-for-VESPA/tree/master/q.rd_examples/planets


The Resource Descriptor (RD) is an XML-written file named q.rd which gives DaCHS the way to fill the table from the CSV file, it follows DACHS Standards for EPN-TAP. Other informations on the RD building are available here.

Structure of the Resource Descriptor

The RD file, named q.rd has this structure :

<resource schema="...">
<meta .../>
...
<meta .../>

<table ...>
<mixin .../>

<column .../>
...
<column .../>
</table>

<data id="import">
<sources .../>

<csvGrammar> <rowfilter procDef="//products#define"> <bind name="table">"\schema.epn_core"</bind> </rowfilter> </csvGrammar>

<make table="epn_core">
<rowmaker idmaps="*">

<var key="...">...</var>
...
<var key="...">...</var>

<apply procDef="//epntap2#populate-2_0" name="fillepn">
<bind name="...">@...</bind>
...
<bind name="...">@...</bind>
</apply>
         </rowmaker>
      </make>
   </data>
</resource>

All the file is contained in the tag <resource>. First, <meta> data are filled, then, <table> is defined containing <mixin> reference and <column> elements defining extra-columns. Into the tag <data>, data ingestion rules are set, the path of the sourcefile is defined into <sources> and the <csv grammar> is specified. The <make><rowmaker> content describes how the mandatory and added columns will be filled. <var> attributes set columns values while <bind> attributes in the <apply> tag link columns to its values and fill the table.


Meta tags

The first part is a set of meta tags with different attributes which defines global characteristics of the table.

<resource schema="planets">
<meta name="title">Characteristics of Planets (demo)</meta>
<meta name="description" format="plain">
Main characteristics of planets. Data are included in the table, therefore most relevant parameters are non-standard in EPN-TAP. Data are retrieved from Archinal et al 2009 (IAU report, 2011CeMDA.109..101A) [radii] and Cox et al 2000 (Allen's astrophysical quantities, 2000asqu.book.....C) [masses, heliocentric distances, and rotation periods]. </meta>
<meta name="creationDate">2015-08-16T09:42:00Z</meta>
<meta name="subject">planet</meta>
<meta name="subject">mass</meta>
<meta name="subject">radius</meta>
<meta name="subject">period</meta>
<meta name="copyright">LESIA-Obs Paris</meta>
<meta name="creator.name">Stephane Erard</meta>
<meta name="publisher">Paris Astronomical Data Centre - LESIA</meta>
<meta name="contact.name">Stephane Erard</meta>
<meta name="contact.email">vo.paris@obspm.fr</meta>
<meta name="contact.address">Observatoire de Paris VOPDC, bat. Perrault, 77 av. Denfert Rochereau, 75014 Paris, FRANCE</meta>
<meta name="source">2000asqu.book.....C</meta>
<meta name="contentLevel">General</meta>
<meta name="contentLevel">University</meta>
<meta name="contentLevel">Research</meta>
<meta name="contentLevel">Amateur</meta>
<meta name="utype">ivo://vopdc.obspm/std/EpnCore#schema-2.0</meta>

Most of the attributes are easy to understand, see this page for detailed explanations and more meta elements.

Meta attribute "subject" is defined several times by different keywords defining data. "source" refers to the resource-related paper. Here, "contentLevel" takes the four values "General", "University", "Research", "Amateur" but it could take only some of these.


Table definition

Then, <table> definition starts, in every EPN-TAP services, the table <id> and <mixin> must respectively take the values  "epn_core" and " //epntap2#table-2.0".

spatial_frame_type attribute defines the type of coordinate system for the defined granules, it could take several values (see spatial_frame_type section in EPN TAP V2 parameter description) its value will impact the coordinates definition (parameters c1, c2 and c3).

You may list  predefined optional EPN-TAP parameters you have chosen to add in the optional_columns attribute.

Here, the spatial_frame_type is defined as celestial which means coordinate definition as ICRS one (not relevant here so it will not be informed below). Optional columns time_scale, publisher and bib_reference have been added to the table.

   <table id="epn_core" onDisk="true" adql="True">

      <mixin spatial_frame_type="celestial"
      optional_columns= "time_scale publisher bib_reference" >//epntap2#table-2_0</mixin>


After mixin definition,  you can start extra-parameters definition with the tag <column>. To do that, you should define the attributes name, type, tablehead, unit (if relevant), description, ucd (a set of keywords which defines the type of data, see ucd  IVOA documentation ) and verblevel (a rate under 30 defining the columns importance). After extra-columns are set, the table definition is complete.

      <column name="distance_to_primary" type="double precision"
tablehead="Distance_to_primary" unit="km"
description="Extra: Mean heliocentric distance (semi-major axis)"
ucd="pos.distance;stat.min"
verbLevel="2"/>
 <column name= ... />
...
<column name= ... />

</table>
Data ingestion

Data ingestion starts with the tag <data>. For the case of data imported from a CSV,  id must take the value "import" and the CSV path must be indicated in <sources> element. <csvGrammars> for EPN-TAP contains global ingestion rules . The <rowfilter> defined here gives an automatic assignment to the CSV and table columns (mandatory, optional and added columns defined earlier) which have the same name. It is possible to define another rowfilter to add special ingestion rules (see an example here).

   <data id="import">
<sources>data/Masses2.csv</sources>
<csvGrammar>
<rowfilter procDef="//products#define">
<bind name="table">"\schema.epn_core"</bind>
</rowfilter>
</csvGrammar>


Still into the <data> tag, the element <make table="epn_core"> aims to fill columns of the table epn_core. <rowmaker idmaps="*"> means every row from the CSV will fill the table.

The <var> elements associate values to the fields which has not been filled automatically by the rowfilter or need post-processing :

Constant value columns must be set like:

<var key="{column name}">{constant_value}</var> 

Varying value columns could be defined by:

<var key="{column name}" source="{csv_column name}"/> 

or 

<var key="{EPN TAP column name}" >{varying_value}</var>

Where {varying_value} could link to another column with the prefix @ or post-process it with simple operations in Python (e.g: <var key="column">@column1+"texte"+@column2[2:8]</var>), this second method is not illustrated in the service planets. To make more complex post-processing, is it possible to use <code> tag into the rowfilter (see here for an example).

      <make table="epn_core">
<rowmaker idmaps="*">
<var key="dataproduct_type">"ci"</var>
...
<var key="publisher">"LESIA" </var>


<bind> elements in <apply> tag link fields set in <var> to its new value and fill table rows, it is only necessary for rows not filled by the rowfilter. Some constant value columns could also be defined in <bind> elements like: <bind name="target_class">"planet"</bind>.

(note: some var elements in the planets example like <var key="target_name" source="target_name" /> are redundant with the rowfilter filling and not necessary, that's why it is not present in bind elements thereafter)

            <apply procDef="//epntap2#populate-2_0" name="fillepn">
<bind name="granule_gid">@granule_gid</bind>
...
<bind name="release_date">@release_date</bind>
</apply>

After bind elements definition, the table construction is finished and the service could be published.


To publish your service on your DaCHS server, you have to create a directory containing your q.rd and your CSV file into gavodachs directory : /var/gavo/inputs/{servicename}, where servicename is the same as <resource schema=".."> value into the RD. In this directory, the RD must be named q.rd.

For our example, in your "planets" work directory in sudo mode, type:

# mkdir /var/gavo/inputs/planets
# mkdir /var/gavo/inputs/planets/data
# cp planets_mixin_q.rd /var/gavo/inputs/planets/q.rd
# cp Masses2.csv /var/gavo/inputs/planets/data/Masses2.csv

Then, open this directory

# cd  /var/gavo/inputs/planets/

And check the syntax of your RD with the command gavo val. If it returns "OK", you can import your service on your DaCHS server with the command gavo imp, if no error is raised, "Rows affected : {...}" will appear, then restart the server to check the newly settled-up service :

# gavo val q.rd

# gavo imp q.rd

# gavo serve restart

These 3 steps are necessary each time the q.rd is modified.

After that, on your browser, type the address :

http://<<my_servername>>.<<my_domain>>:8000/__system__/dc_tables/list  

With <<my_servername>> and <<my_domain>> previously settled-up during your server installation. Click on table info of your service name, you can now check the list of columns and its description and metadata on the left. It is possible to send ADQL queries to interrogate the service.

For example, to select the whole planets database in an ADQL query, you can type:

SELECT * FROM planets.epn_core

 See Checking your VESPA service for more information on checking your service.


To  make your own service available on VESPA portal, you may follow this tutorial : Registering your VESPA EPN-TAP Server. It consists in configuration and publication of your DaCHS server in the IVOA Registry of Registries (RoR) and publication of your TAP service in order to make it available in VESPA Query Interface.

The service Planets may not be registered because it is already available in VESPA.

To go further ...

Another example and more detailed explanations on the setting up of an EPN-TAP service with iks service.

Another method to fill the table with a custom grammar using a python routine.