EUROPLANET2024 Research Infrastructure
Grant agreement no: 871149
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
Digital Object Identifiers for VESPA
Start date of project: 01 February 2020
Duration: 48 Months
Responsible WP Leader: Stéphane Erard
Project co-funded by the European Union's Horizon 2024 research and innovation programme
Restricted to other programme participants (including the Commission Service)
Restricted to a group specified by the consortium (including the Commission Services)
Confidential, only for members of the consortium (excluding the Commission Services)
EPN2024 - RI
48 months: 01 February 2020 – 31 January 2024
Title of Document
Digital Object Identifiers for VESPA
Contributing Work package (s)
Abstract: This document describes the use of Digital Object Identifiers (DOI) within the VESPA activity.
Document history (to be deleted before submission to Commission)
Added ROR entries for VESPA participants
Added ORCID examples
More details on metadata recommendation
|Added reference to PDS-SBN DataCite notes|
|Added section on Landing Page|
added reference to RDA FAIR data maturity model indicators, and added REC-DOI-02 (renumbered the recommendations)
|Added other identifiers|
Digital Object Identifiers (DOI) are persistent identifiers managed by a Registration Agency (RA), such as Datacite . Persistent identifiers allows long term traceability, identification and citation of online resources. We propose to use DOIs for VESPA documents, data collections and software. This document presents the guidelines for the use of DOIs within VESPA.
In this document, we consider the DOI implementation and metadata as managed by Datacite.
A DOI is persistent and so is the metadata stored by the RA. Data centres using DOIs are registered with an RA and commit to keep a landing page for each DOI. This landing page provides either access to the object, or up-to-date information about the object. The landing page URL is stored in the RA database, and may be updated as often as necessary by the data centre.
Should an object disappear, or be removed for any reason, the landing page must be updated, with adequate information: reason for deletion, new version available if any. When the object is not available any more in the version described by the DOI, the landing page becomes a tombstone page.
Most universities can mint DOIs with direct contracts with a RA or through relevant national information technology institutions (e.g., through INIST, in France). Other data centres or projects are proposing to mint DOIs for hosted products. One example is Zenodo.org: anybody can upload (up to 50GB) and get a DOI for a document, a dataset, or any other type of digital content.
DOIs are useful for citation and reference, as we should not rely on world wide web URLs, which are known to die eventually. DOIs can be used for any type of citable (digital or physical) objects. We list below the use cases for VESPA.
Any public deliverables, or other reports and documents.
Identifiers on data can be tricky, and there have been a lot of discussions in many contexts on that topic. There must be a decision on granularity for each project. The granularity is usually set to the data collection level rather than the data file level. However, the data collection should be defined in each context.
Note that continuously growing data collections should be seen as a single collection (no versioning required for every new data addition), unless the new data modifies previously recorded data. If recalibration occurs or metadata are significantly altered, then a new identifier should be minted, with a new version number.
The main science outputs of the VESPA project are EPNcore metadata tables. Those metadata are distributed as added-value collections to an original collection (or a set of original collections). This is typically the case for VESPA/EPNcore metadata pointing to remote data collections. If the metadata collection is not produced by the team in charge of the original data collection, it is required to trace the contribution of each person or group, as well as to link to original data collection through a related identifier link, using the relation type IsMetadataFor. As a general rule, we recommend to set a DOI on individual EPNcore metadata tables, with correct attribution and citation.
The RDA (Research Data Alliance) recommends to setup persistent identifiers to both Data and Metadata. This is the first item of the RDA FAIR data maturity model indicator (RDA-F1-01M), as well as by GoFAIR (https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).
Citable and persistent identifiers are very useful (although not used very much) for scientific software, for the sake of reproducibility. DOIs should be set on main software releases.
For online and web server applications, identifiers could be used to refer to a given version of the interface, however, the access to previously published version is most of the times not possible.
Minting DOIs for a data object composed of search query parameters and associated results is studied by many groups. It is promising in the scope of science reproducibility. Tools have been developed on that topic, such as the VAMDC Query Store, and should be studied.
The RA hosts a database, which stores metadata for all submitted DOI records. We describe here the Datacite metadata dictionary (R01). The Datacite metadata is limited but flexible. It is designed for citation and attribution.
This dictionary includes mandatory metadata, as listed in Table 2 of (R01):
Identifier (with mandatory type sub-property)
This is the DOI.
- Creator (with optional given name, family name, name identifier and affiliation sub-properties)
The creator (or list of creators) for the resource. This is the person responsible for the content and the maintenance of the DOI. ORCIDs should be used here, also recommended by PDS-SBN (R03).
- Title (with optional type sub-properties)
The title of the resource. It must be explicit in a wide context (see this as a journal publication title).
The Publisher is the institution or data centre responsible for the accessibility of the resource.
In the general case, the current year, at the time of DOI minting.
- ResourceType (with mandatory general type description sub-property)
There is a list of predefined general resource types, but you may add a more specific resource type (see in the examples, below)
In addition to the minimal set of required metadata, there are metadata recommended by Datacite (and VESPA):
Subject (with scheme sub-property)
From a thesaurus, preferably a community acknowledged one . Datacite provides examples such as the Library of Congress Subject database ( http://id.loc.gov/authorities/subjects.html ). VESPA also recommends the Unified Astronomy Thesaurus (R02) or other relevant community thesauri, for keyword-based discovery. This is also studied by the PDS-SBN team (R03).
Contributor (with optional given name, family name, name identifier and affiliation sub-properties)
Reference all contributions with roles.
Date (with type sub-property)
Object life event dates (creation, publication...)
Description (with type sub-property)
Abstract-type description of the object
- RelatedIdentifier (with type and relation type sub-properties)
Allows to link to other documents/data/objects with relationType. This is very useful to track the links between objects (e.g., data collection as source of a paper)
GeoLocation (with point, box and polygon sub-properties).
When available or applicable
Correct citation implies correct attribution of the work, and this necessary with respect to scientific ethics. The creator is thus required and contributors are strongly recommended. The list of contributors' roles are explained in Appendix 1 of (R01 ). The related identifiers also contains the bibliographic references. Those metadata should not be overlooked.
Some Datacite optional metadata are recommended in the VESPA context:
Versioning is recommended whenever an object is subject to change
The general rule is to have open access licences. Creative Commons Attribution (CC-BY) is the minimal level for documents. Many science teams select the CC-BY-NC-SA license (attribution, non-commercial and share alike). Open source licences such GPLv3, MIT or Apache are recommended for software.
AlternateIdentifier (with type sub-property)
For documents, it could be VESPA document identifier (e.g., project deliverables identifiers), Arxiv preprint identifier, ADS Bibcode... For data collections, data centre identifier (e.g., NASA/PDS bundle LID/VID, or SPASE resource ID)
- Affiliations and nameIdentifiers for creators and contributors.
Other type of identifiers
Datacite is also promoting the use of other identifiers for names and affiliations. We list below identifiers for persons and institutions. The name and institutional identifiers can be used for Creator and Contributor metadata. However, institutional identifiers are primarily used for Affiliation metadata
Datacite recommends the use of ORCID (https://orcid.org). In the frame of the VESPA plasma and solar related objects (as well as those related to SPIDER), SPASE Person identifiers could also be used.
Table 2. Examples of name identifiers for a few VESPA participants
|VESPA participant||ORCID||SPASE Person Id|
Datacite recommends the use of the Research Organization Registry (ROR, https://ror.org)
Table 1. Identifiers for the research organizations participating in VESPA and SPIDER.
|VESPA Participating Institution||ROR identifier|
|Observatoire de Paris||https://ror.org/029nkcm90|
|University of Bristol||https://ror.org/0524sp257|
Other identifiers may be used, from other registries, such as the Registre National des Structure de Recherche (RNSR, https://appliweb.dgri.education.fr/rnsr/) in France, see Table 2.
Table 2. Identifiers for French institutes participating in VESPA. For those identifiers, the IdentifierScheme and SchemeURI of the Datacite metadata should be respectively set to 'RNSR' and ' https://appliweb.dgri.education.fr/rnsr/ '.
|Institute (Beneficiary)||RNSR Identifier|
Other identifiers could be used for referring to datacenters or data repositories. In this case, the Re3data.org registry is recommended. Table 3 provides Re3data identifiers for VESPA data centres.
Table 3. Identifiers for VESPA related data centres.
|Data Centre||Location||Re3data URI|
Building the Identifier
The DOI is composed of three parts:
<protocol>part is usually set to either
<prefix>part is the identifier of the naming authority (e.g.:
10.25935, for Obs. Paris;
10.5281, for Zenodo)
<suffix>part is set of the naming authority.
The DOI must be unique. A common recommendation is to have an opaque and random
<suffix> (see, e.g., the last bullet of ANDS DOI FAQ , Can DOI strings include our institutional name? ). It is thus strongly discouraged to put anything readable or parsable in the
Selecting the DOI provider
Since the DOI is a persistent identifier to a persistent resource, the DOI provider commits to keep the resource available and accessible. The DOI provider should then be the publisher of the resource.
Zenodo.org may be selected as a simple solution. However, Zenodo.org records are attached to a personal account, and thus can only be updated by the initial submitter. This could cause a problem when updating the resource on long time scales.
The landing pages are essential elements of the DOI chain. DataCite proposes basic level landing page best practices (R04).
This is the place where the data publisher provides information on the data content, its format, its access rules, etc. The landing page should be an intermediate page, not directly the content itself. It is a good practice to recall the DOI on the landing page and provide the user with the citation to be used with this data at the beginning of the page.
The page should contain at least a link to the data, with concise information on the content (data formatting, size of the dataset, etc).
Adding web semantic metadata is also recommended. This takes the form of a JSON-LD mark-up content following the schema.org model. Datacite provides JSON-LD formatted content that can be used to include in the landing pages. There are extra metadata from schema.org that can be added, and are not managed by Datacite, such as the
variableMeasured entity (conceptually close to the VESPA
|REC-DOI-01||Scope||Any output requiring long-term availability (document, collection, software release) should be citable with DOI.|
Metadata (EPNcore tables) should have their own DOI, with the adequate related identifier link to the corresponding data collection.
This recommendation is particularly applicable when:
When an object requires versioning, we follow the Zenodo scheme: We set a main DOI (always pointing to the latest version), and DOIs for each versions.
Versioning should only be used when an object is subject to change.
|REC-DOI-04||Mapping Names and Affiliations||VESPA should build a local registry database for participants with DOI metadata export capability (including ORCID for name identifiers, and ROR for affiliation identifiers)|
|REC-DOI-05||Attribution and Citation||Creator and contributors should be acknowledged in the datacite record, for adequate attribution.|
|REC-DOI-06||Name Identifiers||Recommend adoption and usage of ORCID|
|REC-DOI-07||Subjects||Recommend usage of UAT (Unified Astronomy Thesaurus, R02)|
Licensing information should be available in the DOI metadata record
|REC-DOI-09||Identifier||The suffix part of the DOI must be opaque and random.|
|REC-DOI-10||Publisher||The DOI provider must be the resource publisher.|
|REC-DOI-11||Landing Page / Citation||The landing page should provide the citation to be used when the object is cited.|
|REC-DOI-12||Landing Page / Semantic||Use the Schema.org semantic mark-up in the landing page to increase the visibility of the object, specifically for datasets.|
R01. DataCite Metadata Working Group. (2019). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. DataCite e.V. https://doi.org/10.14454/7xq3-zf69
R02. Frey, K., Accomazzi, A. The Unified Astronomy Thesaurus: Semantic Metadata for Astronomy and Astrophysics. The Astrophysical Journal Supplement Series, Volume 236, Issue 1, article id. 24, 7 pp. (2018). https://doi.org/10.3847/1538-4365/aab760
R03. Raugh, Anne, DataCite Schema review. PDS Small Body Node wiki (2019). http://sbndev.astro.umd.edu/wiki/DataCite_Schema
R04. DataCite Landing Page Best Practicies. https://support.datacite.org/docs/landing-pages