Page tree
Skip to end of metadata
Go to start of metadata



EPN2024-RI


EUROPLANET2024 Research Infrastructure 

H2020-INFRAIA-2019-1  

Grant agreement no: 871149


Document: VESPA-WP6-2-041-TN-v1.0(83)


doi:10.25935/p90x-ty59


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License







Digital Object Identifiers for VESPA




Date: 2020-06-19


Start date of project: 01 February  2020

 Duration: 48 Months

Responsible WP Leader: Stéphane Erard


Project co-funded by the European Union's Horizon 2024 research and innovation programme

Dissemination level

PU

Public

  •  

PP

Restricted to other programme participants (including the Commission Service)

  •  

RE

Restricted to a group specified by the consortium (including the Commission Services)

  •  

CO

Confidential, only for members of the consortium (excluding the Commission Services)

  •  

Project Number

871149

Project Title

EPN2024 - RI

Project Duration

48 months: 01 February 2020 – 31 January 2024

Document Number

WP6-task2-041-v1.0

Persistent Identifier

doi:10.25935/p90x-ty59

Issue date

 

Title of Document

Digital Object Identifiers for VESPA

Contributing Work package (s)

WP6

Dissemination level

PU

License

CC-BY-SA

Author (s)


Abstract: This document describes the use of Digital Object Identifiers (DOI) within the VESPA activity.



Document history (to be deleted before submission to Commission)

Date

Version

Editor

Change

Status

 

0.1

first draft

DRAFT

 

0.2

Baptiste Cecconi

Added ROR entries for VESPA participants 

Added ORCID examples

Added References

Improved Recommendations

DRAFT

 

0.3

Added Examples

DRAFT

 

0.4

Baptiste Cecconi

More details on metadata recommendation

DRAFT

 

0.5

Added reference to PDS-SBN DataCite notes

DRAFT

 

0.6

Added section on Landing Page

DRAFT

 

0.7

Added R04

DRAFT

 

0.8

added reference to RDA FAIR data maturity model indicators, and added REC-DOI-02 (renumbered the recommendations)

DRAFT

 

1.0

Added other identifiers

RELEASED

Introduction

Digital Object Identifiers (DOI) are persistent identifiers managed by a Registration Agency (RA), such as Datacite . Persistent identifiers allows long term traceability, identification and citation of online resources. We propose to use DOIs for VESPA documents, data collections and software. This document presents the guidelines for the use of DOIs within VESPA.

In this document, we consider the DOI implementation and metadata as managed by Datacite.

A DOI is persistent and so is the metadata stored by the RA. Data centres using DOIs are registered with an RA and commit to keep a landing page for each DOI. This landing page provides either access to the object, or up-to-date information about the object. The landing page URL is stored in the RA database, and may be updated as often as necessary by the data centre.

Should an object disappear, or be removed for any reason, the landing page must be updated, with adequate information: reason for deletion, new version available if any. When the object is not available any more in the version described by the DOI, the landing page becomes a tombstone page.

Most universities can mint DOIs with direct contracts with a RA or through relevant national information technology institutions (e.g., through INIST, in France). Other data centres or projects are proposing to mint DOIs for hosted products. One example is Zenodo.org: anybody can upload (up to 50GB) and get a DOI for a document, a dataset, or any other type of digital content.

Scope

DOIs are useful for citation and reference, as we should not rely on world wide web URLs, which are known to die eventually. DOIs can be used for any type of citable (digital or physical) objects. We list below the use cases for VESPA.

Documents

Any public deliverables, or other reports and documents.

Data Collections

Identifiers on data can be tricky, and there have been a lot of discussions in many contexts on that topic. There must be a decision on granularity for each project. The granularity is usually set to the data collection level rather than the data file level. However, the data collection should be defined in each context.

Note that continuously growing data collections should be seen as a single collection (no versioning required for every new data addition), unless the new data modifies previously recorded data. If recalibration occurs or metadata are significantly altered, then a new identifier should be minted, with a new version number.

The main science outputs of the VESPA project are EPNcore metadata tables. Those metadata are distributed as added-value collections to an original collection (or a set of original collections). This is typically the case for VESPA/EPNcore metadata pointing to remote data collections. If the metadata collection is not produced by the team in charge of the original data collection, it is required to trace the contribution of each person or group, as well as to link to original data collection through a related identifier link, using the relation type IsMetadataFor. As a general rule, we recommend to set a DOI on individual EPNcore metadata tables, with correct attribution and citation.

The RDA (Research Data Alliance) recommends to setup persistent identifiers to both Data and Metadata. This is the first item of the RDA FAIR data maturity model indicator (RDA-F1-01M), as well as by GoFAIR (https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/).

Software

Citable and persistent identifiers are very useful (although not used very much) for scientific software, for the sake of reproducibility. DOIs should be set on main software releases.

For online and web server applications, identifiers could be used to refer to a given version of the interface, however, the access to previously published version is most of the times not possible.

It is noticeable that any code hosted on Github.com can have releases, which can then be published in Zenodo.org.

Search Queries

Minting DOIs for a data object composed of search query parameters and associated results is studied by many groups. It is promising in the scope of science reproducibility. Tools have been developed on that topic, such as the VAMDC Query Store, and should be studied.

Metadata

The RA hosts a database, which stores metadata for all submitted DOI records. We describe here the Datacite metadata dictionary (R01). The Datacite metadata is limited but flexible. It is designed for citation and attribution. 

Mandatory keywords

This dictionary includes mandatory metadata, as listed in Table 2 of (R01): 

  • Identifier (with mandatory type sub-property)
    This is the DOI.

  • Creator (with optional given name, family name, name identifier and affiliation sub-properties)
    The creator (or list of creators) for the resource. This is the person responsible for the content and the maintenance of the DOI. ORCIDs should be used here, also recommended by PDS-SBN (R03).
  • Title (with optional type sub-properties)
    The title of the resource. It must be 
    explicit in a wide context (see this as a journal publication title). 
  • Publisher
    The Publisher is the institution or data centre responsible for the accessibility of the resource.
  • PublicationYear
    In the general case, the current year, at the time of DOI minting.
  • ResourceType (with mandatory general type description sub-property)
    There is a list of predefined general resource types, but you may add a more specific resource type (see in the examples, below) 

Recommended Metadata

In addition to the minimal set of required metadata, there are metadata recommended by Datacite (and VESPA): 

  • Subject (with scheme sub-property)
    From a thesaurus, preferably a community acknowledged one
    . Datacite provides examples such as the Library of Congress Subject database ( http://id.loc.gov/authorities/subjects.html ). VESPA also recommends the Unified Astronomy Thesaurus (R02) or other relevant community thesauri, for keyword-based discovery. This is also studied by the PDS-SBN team (R03).

  • Contributor (with optional given name, family name, name identifier and affiliation sub-properties)
    Reference all contributions with roles. 

  • Date (with type sub-property)
    Object life event dates (creation, publication...)

  • Description (with type sub-property)
    Abstract-type description of the object

  • RelatedIdentifier (with type and relation type sub-properties)
    Allows 
    to link to other documents/data/objects with relationType. This is very useful to track the links between objects (e.g., data collection as source of a paper)
  • GeoLocation (with point, box and polygon sub-properties).
    When available or applicable

Correct citation implies correct attribution of the work, and this necessary with respect to scientific ethics. The creator is thus required and contributors are strongly recommended. The list of contributors' roles are explained in Appendix 1 of (R01 ). The related identifiers also contains the bibliographic references. Those metadata should not be overlooked.  

Optional Metadata

Some Datacite optional metadata are recommended in the VESPA context:

  • Version
    Versioning is recommended whenever an object is subject to change
  • Rights
    The general rule is to have open access licences. Creative Commons Attribution (CC-BY) is the minimal level for documents. Many science teams select the CC-BY-NC-SA license (attribution, non-commercial and share alike). Open source licences such GPLv3, MIT or Apache are recommended for software.
  • AlternateIdentifier (with type sub-property)
    For documents, it could be VESPA document identifier (e.g., project deliverables identifiers), Arxiv preprint identifier, ADS Bibcode... For data collections, data centre identifier (e.g., NASA/PDS bundle LID/VID, or SPASE resource ID)

  • Affiliations and nameIdentifiers for creators and contributors.

Other type of identifiers

Datacite is also promoting the use of other identifiers for names and affiliations. We list below identifiers for persons and institutions. The name and institutional identifiers can be used for Creator and Contributor metadata. However,  institutional identifiers are primarily used for Affiliation metadata

Name identifiers

Datacite recommends the use of ORCID (https://orcid.org). In the frame of the VESPA plasma and solar related objects (as well as those related to SPIDER), SPASE Person identifiers could also be used.

Table 2. Examples of name identifiers for a few VESPA participants

Institutional identifiers

Datacite recommends the use of the Research Organization Registry (ROR, https://ror.org)

Table 1. Identifiers for the research organizations participating in VESPA and SPIDER.

Other identifiers may be used, from other registries, such as the Registre National des Structure de Recherche (RNSR, https://appliweb.dgri.education.fr/rnsr/) in France, see Table 2.

Table 2. Identifiers for French institutes participating in VESPA. For those identifiers, the IdentifierScheme and SchemeURI of the Datacite metadata should be respectively set to 'RNSR' and ' https://appliweb.dgri.education.fr/rnsr/ '.

Institute (Beneficiary)RNSR Identifier
CDS (CNRS)199712602R
CRPG (CNRS)201320574L
DIO (ObsParis)200610854B
GEOPS (CNRS)200412804E
IPAG (CNRS)201119432D
IPSL (CNRS)200610636P
IRAP (CNRS)201119477C
LESIA (ObsParis)200212766X
LMD (CNRS)199812867Z
USN (ObsParis)199519805D

Other identifiers could be used for referring to datacenters or data repositories. In this case, the Re3data.org registry is recommended. Table 3 provides Re3data identifiers for VESPA data centres. 

Table 3. Identifiers for VESPA related data centres.

Building the Identifier

The DOI is composed of three parts: < protocol><prefix>/<suffix>.

  • The <protocol> part is usually set to either doi:, or https://doi.org/ .
  • The <prefix> part is the identifier of the  naming  authority (e.g.: 10.25935, for Obs. Paris; 10.5281, for Zenodo)
  • The <suffix> part is set of the naming authority.

The DOI must be unique.  A common recommendation is to have an opaque and random  <suffix>  (see, e.g., the last bullet of ANDS DOI FAQ , Can DOI strings include our institutional name? ). It is thus  strongly  discouraged to put anything  readable  or parsable in the <suffix> part.

Selecting the DOI provider

Since the DOI is a persistent identifier to a persistent resource, the DOI provider commits to keep the resource available and accessible. The DOI provider should then be the publisher of the resource.

Zenodo.org may be selected as a simple solution. However, Zenodo.org records are attached to a personal account, and thus can only be updated by the initial submitter. This could cause a problem when updating the resource on long time scales.     

Landing Pages

The landing pages are essential elements of the DOI chain. DataCite proposes basic level landing page best practices (R04).

This is the place where the data publisher provides information on the data content, its format, its access rules, etc. The landing page should be an intermediate page, not directly the content itself. It is a good practice to recall the DOI on the landing page and provide the user with the citation to be used with this data at the beginning of the page. 

The page should contain at least a link to the data, with concise information on the content (data formatting, size of the dataset, etc). 

Adding web semantic metadata is also recommended. This takes the form of a JSON-LD mark-up content following the schema.org model.  Datacite provides JSON-LD formatted content that can be used to include in the landing pages. There are extra metadata from schema.org that can be added, and are not managed by Datacite, such as the variableMeasured entity (conceptually close to the VESPA measurement_type keyword).

Examples

Recommendations

IDTopicRecommendation
REC-DOI-01ScopeAny output requiring long-term availability (document, collection, software release) should be citable with DOI.
REC-DOI-02Scope

Metadata (EPNcore tables) should have their own DOI, with the adequate related identifier link to the corresponding data collection.

This recommendation is particularly applicable when:

  • the EPNcore table covers collections identified by several DOIs;
  • the EPNcore table and the corresponding data are not managed by the same team.  
REC-DOI-03Versioning

When an object requires versioning, we follow the Zenodo scheme: We set a main DOI (always pointing to the latest version), and DOIs for each versions. 

Versioning should only be used when an object is subject to change.

REC-DOI-04Mapping Names and AffiliationsVESPA should build a local registry database for participants with DOI metadata export capability (including ORCID for name identifiers, and ROR for affiliation identifiers)
REC-DOI-05Attribution and CitationCreator and contributors should be acknowledged in the datacite record, for adequate attribution.
REC-DOI-06Name IdentifiersRecommend adoption and usage of ORCID
REC-DOI-07SubjectsRecommend usage of UAT (Unified Astronomy Thesaurus, R02)
REC-DOI-08License

Licensing information should be available in the DOI metadata record

REC-DOI-09IdentifierThe suffix part of the DOI must be opaque and random.
REC-DOI-10PublisherThe DOI provider must be the resource publisher.
REC-DOI-11Landing Page / CitationThe landing page should provide the citation to be used when the object is cited.
REC-DOI-12Landing Page / SemanticUse the Schema.org semantic mark-up in the landing page to increase the visibility of the object, specifically for datasets. 

References

R01. DataCite Metadata Working Group. (2019). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. DataCite e.V. https://doi.org/10.14454/7xq3-zf69

R02. Frey, K., Accomazzi, A. The Unified Astronomy Thesaurus: Semantic Metadata for Astronomy and Astrophysics. The Astrophysical Journal Supplement Series, Volume 236, Issue 1, article id. 24, 7 pp. (2018). https://doi.org/10.3847/1538-4365/aab760 

R03. Raugh, Anne, DataCite Schema review. PDS Small Body Node wiki (2019). http://sbndev.astro.umd.edu/wiki/DataCite_Schema

R04. DataCite Landing Page Best Practicies. https://support.datacite.org/docs/landing-pages

  • No labels
Write a comment…