Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Setting-up an EPN-TAP service means publishing a table in a server (DACHS for this tutorial) describing data we want to distribute. This table is referencing references each piece of information with a maximum amount of attributes by filling parameters defined by the standards of EPN-TAP called epn_core view. The EPN-TAP standards have mandatory parameters (most of them could be left empty) and optional parameters, finally additional parameters could be defined to provide specific information. EPN-TAP is a way to interrogate registered services published in different DaCHS servers using parameters or ADQL queries.

...

First, it is necessary to have settled-up an EPN-TAP server. The method to set up a virtual machine hosting a DaCHS server is described in this tutorial: EPN-TAP Server Installation for VESPA Data Provider Tutorial . You can also build it on a Docker container.


  • Define granules

To publish your own service, you have to define granules. Granules correspond to table rows, they represent the smallest piece of information accessible in the service. A granule can be one file (linked with the optional parameter access_url) or a set of parameters described into the table. A granule must have one unique identifier which is a primary key for the table (the mandatory parameter for this identifier is granule_uid). Each different type of granule must have an identifier (granule_gid parameter) so it is possible to group data by type.

...

For this example, we give the link to a hand-written CSV,  you can download it and the associated Resource Descriptor from github:

https://voparis-gitlab.obspm.fr/workshop-2021-material/planets_for_tuto


  • Building a resource descriptor

...

<resource schema="...">
<meta .../>
...
<meta .../>

<table ...>
<mixin .../>

<column .../>
...
<column .../>
</table>

<data id="import">
<sources .../>

<csvGrammar> <rowfilter procDef="//products#define"> <bind name="table">"\schema.epn_core"</bind> </rowfilter> </csvGrammar>

<make table="epn_core">
<rowmaker idmaps="*">
<var<map key="...">...</var>map>
...
<var<map key="...">...</var>
<apply procDef="//epntap2#populate-2_0" name="fillepn">
<bind name="...">@...</bind>
...
<bind name="...">@...</bind>
</apply>
         </rowmaker>
      </make>
   </data>
</resource>

The tag <resource> encompasses the others. First, <meta> data are filled, then, <table> is defined containing <mixin> reference and <column> elements defining extra-columns. Into the tag <data>, data ingestion rules are set, the path of the sourcefile is defined into <sources> and the <csv grammar> is specified. The <make><rowmaker> content describes how the mandatory and added columns will be filled. <var> attributes set columns values while <bind> attributes in the <apply> tag link columns to its values and fill the table.

Meta tags

The first part is a set of meta tags with different attributes which defines global characteristics of the table. Meta tags aim to describe the service in the registry.

<resource schema="planets">
    <meta name="title">Characteristics of Planets (demo)</meta>
    <meta name="description" format="plain">
    Main characteristics of planets. Data are included in the table, therefore most relevant parameters are non-standard in EPN-TAP. Data are retrieved from Archinal et al 2009 (IAU report, 2011CeMDA.109..101A) [radii] and Cox et al 2000 (Allen's astrophysical quantities, 2000asqu.book.....C) [masses, heliocentric distances, and rotation periods]. </meta>
    <meta name="creationDate">2015-08-16T09:42:00Z</meta>map>

         </rowmaker>
      </make>
   </data>
</resource>

The tag <resource> encompasses the others. First, <meta> data are filled, then, <table> is defined containing <mixin> reference and <column> elements defining extra-columns. Into the tag <data>, data ingestion rules are set, the path of the sourcefile is defined into <sources> and the <csv grammar> is specified. The <make><rowmaker> content describes how the mandatory and added columns will be filled. <map> attributes set columns values and fill the table.


Meta tags

The first part is a set of meta tags with different attributes which defines global characteristics of the table. Meta tags aim to describe the service in the registry.

<resource schema="planets">
    <meta name="subject">solar-system-astronomy</meta>    title">Characteristics of Planets (demo)</meta>
    <meta name="subject">planetary-science</meta>description" format="plain">
    <meta name="subject">solar-system-planets</meta>
    <meta name="subject">periodic-orbit</meta>
    <meta name="copyright">LESIA-Obs Paris</meta>
    <meta name="creator.name">Stephane Erard</meta>
    <meta name="publisher">Paris Astronomical Data Centre - LESIA</meta>
    <meta name="contact.name">Stephane Erard< Main characteristics of planets. Data are included in the table, therefore most relevant parameters are non-standard in EPN-TAP. Data are retrieved from Archinal et al 2009 (IAU report, 2011CeMDA.109..101A) [radii] and Cox et al 2000 (Allen's astrophysical quantities, 2000asqu.book.....C) [masses, heliocentric distances, and rotation periods]. </meta>
    <meta name="contact.email">vo.paris@obspm.fr<creationDate">2015-08-16T09:42:00Z</meta>
    <meta name="contact.address">Observatoire de Paris VOPDC, bat. Perrault, 77 av. Denfert Rochereau, 75014 Paris, FRANCE</meta>subject">solar-system-astronomy</meta>    
    <meta name="source">2000asqu.book.....C<subject">planetary-science</meta>
    <meta name="contentLevel">General<subject">solar-system-planets</meta>
    <meta name="contentLevelsubject">University<>periodic-orbit</meta>
    <meta name="contentLevel">Research<copyright">LESIA-Obs Paris</meta>
    <meta name="contentLevel">Amateur</meta>

Most of the attributes are easy to understand, see this page for detailed explanations and more meta elements.

Meta attribute "subject" is defined several times by different keywords defining data from UAT (Unified Astronomy Thesaurus). At least one of them must refer to a global topic listed in this page. In the context of VESPA, the 3 appropriate global topics are : "Exoplanet astronomy", "Solar physics" and "Solar system astronomy". The attribute "source" refers to the resource-related paper. Here, "contentLevel" takes the four values "General", "University", "Research", "Amateur" but it could take only some of these.

Table definition

Then, <table> definition starts, in every EPN-TAP services, the table <id> and <mixin> must respectively take the values  "epn_core" and " //epntap2#table-2.0".

spatial_frame_type attribute defines the type of coordinate system for the defined granules, it could take several values (listed in spatial_frame_type section of EPN TAP V2 parameter description) its choice will impact the coordinates definition (parameters c1, c2 and c3).

The mixin "//epntap2#table-2_0" provides a standard definition of mandatory parameters and some optional ones. Mandatory parameters will be automatically present in the table and you may specify predefined optional columns  you want to include in optional_columns attribute. Then, additional columns could be added, but it is necessary to define them manually in <column> tags.

The tree optional columns time_scale, publisher and bib_reference are added to the table in this example.

...

After mixin definition,  you can start extra-parameters definition with the tag <column>. To do that, you should define the attributes name, type, tablehead, unit (if relevant, listed here), description, ucd (a set of keywords which defines the type of data, see ucd  IVOA documentation ) and verblevel (a rate under 30 defining the columns importance). After extra-columns are set, the table definition is complete.

...

Data ingestion

Data ingestion starts with the tag <data>. For the case of data imported from a CSV,  id must take the value "import" and the CSV path must be indicated in <sources> element. <csvGrammars> for EPN-TAP contains global ingestion rules . The <rowfilter> defined here gives an automatic assignment to the CSV and table columns (mandatory, optional and added columns defined earlier) which have the same name. It is possible to define another rowfilter to add special ingestion rules (see an example here).

   <data id="import">
<sources>Masses2.csv</sources>
<csvGrammar>
<rowfilter procDef="//products#define">
<bind name="table">"\schema.epn_core"</bind>
</rowfilter>
</csvGrammar>

Still into the <data> tag, the element <make table="epn_core"> aims to fill columns of the table epn_core.

The <var> and <map> elements associate values to the columns which has not been filled automatically by the rowfilter or need post-processing :

Constant value columns must be set like:

<var key="{column name}">{constant_value}</var> 

Varying value columns could be defined by:

<var key="{EPN TAP column name}" source="{csv column name}"/> 

or 

<var key="{EPN TAP column name}" >{varying_value}</var>

Where {varying_value} could link to another column with the prefix @ or post-process it with simple operations in Python (e.g: <var key="column">@column1+"texte"+@column2[2:8]</var>), this second method is not illustrated in the service planets. To make more complex post-processing, is it possible to use <code> tag into the rowfilter (see here for an example).

      <make table="epn_core">creator.name">Stephane Erard</meta>
    <meta name="publisher">Paris Astronomical Data Centre - LESIA</meta>
    <meta name="contact.name">Stephane Erard</meta>
    <meta name="contact.email">vo.paris@obspm.fr</meta>
    <meta name="contact.address">Observatoire de Paris VOPDC, bat. Perrault, 77 av. Denfert Rochereau, 75014 Paris, FRANCE</meta>
    <meta name="source">2000asqu.book.....C</meta>
    <meta name="contentLevel">General</meta>
    <meta name="contentLevel">University</meta>
    <meta name="contentLevel">Research</meta>
    <meta name="contentLevel">Amateur</meta>

Most of the attributes are easy to understand, see this page for detailed explanations and more meta elements.

Meta attribute "subject" is defined several times by different keywords defining data from UAT (Unified Astronomy Thesaurus). At least one of them must refer to a global topic listed in this page. In the context of VESPA, the 3 appropriate global topics are : "Exoplanet astronomy", "Solar physics" and "Solar system astronomy". The attribute "source" refers to the resource-related paper. Here, "contentLevel" takes the four values "General", "University", "Research", "Amateur" but it could take only some of these.


Table definition

Then, <table> definition starts, in every EPN-TAP services, the table <id> and <mixin> must respectively take the values  "epn_core" and " //epntap2#table-2.0".

spatial_frame_type attribute defines the type of coordinate system for the defined granules, it could take several values (listed in spatial_frame_type section of EPN TAP V2 parameter description) its choice will impact the coordinates definition (parameters c1, c2 and c3).

The mixin "//epntap2#table-2_0" provides a standard definition of mandatory parameters and some optional ones. Mandatory parameters will be automatically present in the table and you may specify predefined optional columns  you want to include in optional_columns attribute. Then, additional columns could be added, but it is necessary to define them manually in <column> tags.

The tree optional columns time_scale, publisher and bib_reference are added to the table in this example.

   <table id="epn_core" onDisk="true" adql="True">

     <mixin spatial_frame_type="none"
      optional_columns= "time_scale publisher bib_reference" >//epntap2#table-2_0</mixin>

After mixin definition,  you can start extra-parameters definition with the tag <column>. To do that, you should define the attributes name, type, tablehead, unit (if relevant, listed here), description, ucd (a set of keywords which defines the type of data, see ucd  IVOA documentation ) and verblevel (a rate under 30 defining the columns importance). After extra-columns are set, the table definition is complete.

      <column name="distance_to_primary" type="double precision"
tablehead="Distance_to_primary" unit="km"
description="Extra: Mean heliocentric distance (semi-major axis)"
ucd="pos.distance;stat.min"
verbLevel="2"/>
 <column name= ... />
...
<column name= ... />

</table>
Data ingestion

Data ingestion starts with the tag <data>. For the case of data imported from a CSV,  id must take the value "import" and the CSV path must be indicated in <sources> element. <csvGrammars> for EPN-TAP contains global ingestion rules . The <rowfilter> defined here gives an automatic assignment to the CSV and table columns (mandatory, optional and added columns defined earlier) which have the same name. It is possible to define another rowfilter to add special ingestion rules (see an example here).

   <data id="import">
<sources>Masses2.csv</sources>
<csvGrammar>
<rowmaker<rowfilter idmapsprocDef="*//products#define">
           <var<bind keyname="obs_id" source="obs_id" />table">"\schema.epn_core"</bind>
...</rowfilter>
           <map key="measurement_type">"phys.mass#phys.size.radius"</map>

<bind> elements in <apply> tag link fields set in <var> to its new value and fill table rows, it is only necessary for rows not filled by the rowfilter. Some constant value columns could also be defined in <bind> elements like: <bind name="target_class">"planet"</bind>.

...

 </csvGrammar>


Still into the <data> tag, the element <make table="epn_core"> aims to fill columns of the table epn_core.

The <map> elements associate values to the columns which has not been filled automatically by the rowfilter or need post-processing :

Constant value columns must be set like:

<map key="{column name}">{constant_value}</map> 

Varying value columns could be defined by:

<map key="{EPN TAP column name}" source="{csv column name}"/> 

or 

<map key="{EPN TAP column name}" >{varying_value}</map>

Where {varying_value} could link to another column with the prefix @ or post-process it with simple operations in Python (e.g: <map key="column">@column1+"text"+@column2[2:8]</map>), this second method is not illustrated in the service planets. To make more complex post-processing, is it possible to use <code> tag into the rowfilter (see here for an example).

            <apply procDef="//epntap2#populate-2_0" name="fillepn">
<make table="epn_core">
<rowmaker idmaps="*">
          <bind <map namekey="granule_gid">@granule_gid</bind>
obs_id" source="obs_id" />
...
              <map <map key="measurement_type">"phys.mass#phys.size.radius"</map>
</apply>

After bind map elements definition, the table construction is finished and the service could be published.

...

$ sudo mkdir /var/gavo/inputs/planets$ sudo mkdir /var/gavo/inputs/planets/data

Then go to the directory in which you have downloaded the RD and the CSV of the example "planets" from gitlab (here we assume its path is ~/planets) and copy these files into the directory previously created:

...

Another method to fill the table with a custom grammar using a python routine.

It is also possble possible to import data from MySQL or PostgreSQL databases in the resource descriptor using odbcgrammar .