⚠️ DEPRECATED: This version of the specification (Release 5) is deprecated. Please refer to the latest version.

Abstract

The HealthDCAT Application Profile (HealthDCAT-AP) is a domain-specific metadata model designed to support the implementation of the secondary use framework under the European Health Data Space (EHDS).

HealthDCAT-AP introduces new properties that enable semantically consistent dataset descriptions, structured for automated exchange and federation across catalogues. This ensures that metadata can be processed and integrated into local, regional, national and EU-level catalogues, including the one operated by the central platform of the HealthData@EU infrastructure.

A distinctive feature of HealthDCAT-AP is its explicit classification of dataset access rights into three categories: non-public, restricted, and public. Each category is defined with specific technical requirements.

In addition, new controlled vocabularies have been developed to harmonise property values, thereby ensuring comparability and interoperability of dataset descriptions across Member States.

Introduction

Context

The Health extension of DCAT-AP is being developped by the European Commission DG SANTE to fulfil the requirements of Article 77 of Regulation (EU) 2025/327 [[OJ L, 2025/327, 5.3.2025]] establishing the European Health Data Space (EHDS).

Article 77(requires Health Data Access Bodies (HDABs) to publish public catalogues describing the datasets they are responsible for, under the secondary use framework including their content, holders, and access conditions. These catalogues are a core component of the EHDS infrastructure: they are intended to inform potential data users about what data exists, who holds it, and under what terms it may be accessed for secondary use.

The purpose of HealthDCAT-AP is to ensure that datasets made available for secondary use are described in a clear, consistent and comparable manner across the Union. This harmonisation is essential to:

Scope of HealthDCAT-AP in Release 5

Health DCAT-AP is developed by the European Commission DG SANTE in successive iterations. The DG SANTE implementation can be accessed in the HealthData@EU Central Platform, where a dedicated database was developed to store, display, search, and filter Health DCAT-AP-compliant datasets. Ccompliance of the datasets is assured through the microservice – HealthData@EU Validator (a customised implementation of the Interoperability Test Bed).

Additionally, the below tools are available for the users, to facilitate their work with HealthDCAt-AP:

Scope of the Application Profile : Enhancing DCAT-AP for the secondary use of health data

The HealthDCAT-AP designed as an extension of the DCAT-AP incorporates its principal classes such as dcat:Catalog, dcat:CatalogRecord, dcat:Dataset, dcat:Distribution, and dcat:DataService. This extension leverages RDF's flexible architecture to enhance metadata capabilities without compromising the stability of existing or under-development catalog systems. By integrating new metadata elements as triples, the extension enriches the metadata model without altering established structures. To ensure interoperability, the extension adheres to several principles:

To effectively extend DCAT-AP, several best practices are recommended: This structured approach to developing HealthDCAT-AP ensures that the metadata model is enhanced while maintaining interoperability and compliance with established standards, thereby supporting a more interconnected and accessible digital health ecosystem.

The development of the HealthDCAT extension is an ongoing iterative process that will extend beyond the pilot project, incorporating continuous feedback and contributions from the stakeholder community. This document specifies a first iteration of the HealthDCAT-AP extension, a customised adaptation of the EU DCAT Application Profile. It adapts the DCAT data model specifically for health-related catalogued resources, in alignment with the Regulation on the European Health Data Space [[OJ L, 2025/327, 5.3.2025]]. This customisation facilitates the management and dissemination of health data within the regulatory framework, enhancing its accessibility and utility for health data users.

Comments and queries should be sent via the issue tracker of the dedicated GitHub repository.

Status

This application profile has the status European Commission Draft Specification published at 2025-09-22.

License

Copyright © 2025 European Union. All material in this repository is published under the license CC-BY 4.0, unless explicitly otherwise mentioned.

Conformance Statement

For applications to comply with HealthDCAT-AP, they MUST first conform to DCAT-AP. Additionally, these applications MUST adhere to the specified constraints and usage guidelines, following conformance statements similar to those outlined in DCAT-AP.

Provider requirements

In order to conform to this Application Profile, an application that provides metadata MUST: The application of the controlled vocabularies as described in section [[[#controlled-vocs]]].

Receiver requirements

In order to conform to this Application Profile, an application that receives metadata MUST be able to:
  • Process information for all classes and properties specified in section [[[#quick-reference]]].
  • Process information for all controlled vocabularies specified in section [[[#controlled-vocs]]].
  • "Processing" means that receivers must accept incoming data and transparently provide these data to applications and services. It does neither imply nor prescribe what applications and services finally do with the data (parse, convert, store, make searchable, display to users, etc.).

    Terminology

    personal electronic health data means data concerning health and genetic data, processed in an electronic form; [[OJ L, 2025/327, 5.3.2025 Art.2.2(a)]]

    non-personal electronic health data means electronic health data other than personal electronic health data, including both data that have been anonymised so that they no longer relate to an identified or identifiable natural person (the 'data subject') and data that have never related to a data subject; [[OJ L, 2025/327, 5.3.2025 Art.2.2(b)]]

    health data holder means any natural or legal person, public authority, agency or other body in the healthcare or the care sectors, including reimbursement services where necessary, as well as any natural or legal person developing products or services intended for the health, healthcare or care sectors, developing or manufacturing wellness applications, performing research in relation to the healthcare or care sectors or acting as a mortality registry, as well as any Union institution, body, office or agency, that has either:

    Health Data Access Body Bodies that are responsible for processing the health data applications and granting access to electronic health data for secondary use if the application is accepted.

    An Application Profile is a specification that reuses terms from one or more base standards, adding more specificity by identifying mandatory, recommended and optional elements to be used for a particular application, as well as recommendations for controlled vocabularies to be used. Application refers to the usage context of the specification. It may be abstract, for instance covering a data theme such as mobility data, but also it may refer to specific tools like open data portals. For DCAT-AP the usage scope is broad and in the first place taking into account the European legal context.

    A Dataset is a collection of data, published or curated by a single source, and available for access or download in one or more formats. A Data Portal is a Web-based system that contains a data catalogue with descriptions of datasets and provides services enabling discovery and reuse of the datasets.

    Used Prefixes

    PrefixNamespace IRI
    admshttp://www.w3.org/ns/adms#
    csvwhttp://www.w3.org/ns/csvw#
    dcathttp://www.w3.org/ns/dcat#
    dcataphttp://data.europa.eu/r5r/
    dcthttp://purl.org/dc/terms/
    dctypehttp://purl.org/dc/dcmitype/
    dpvhttps://w3id.org/dpv#
    dqvhttp://www.w3.org/ns/dqv#
    foafhttp://xmlns.com/foaf/0.1/
    healthdcataphttp://healthdataportal.eu/ns/health#
    locnhttp://www.w3.org/ns/locn#
    odrlhttp://www.w3.org/ns/odrl/2/
    owlhttp://www.w3.org/2002/07/owl#
    provhttp://www.w3.org/ns/prov#
    rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
    rdfshttp://www.w3.org/2000/01/rdf-schema#
    skoshttp://www.w3.org/2004/02/skos/core#
    spdxhttp://spdx.org/rdf/terms#
    timehttp://www.w3.org/2006/time#
    vcardhttp://www.w3.org/2006/vcard/ns#
    xsdhttp://www.w3.org/2001/XMLSchema#

    Overview

    HealthDCAT-AP is an extension of DCAT-AP.

    This application profile is meant to provide a DCAT-AP-conformant representation of metadata specific to health datasets in scope of the EHDS Regulation [[OJ L, 2025/327, 5.3.2025]].

    HealthDCAT-AP extends DCAT-AP by including:

    HealthDCAT Application profile diagram

    An overview of HealthDCAT-AP is shown by the UML diagram below. The UML diagram illustrates the specification described in this document. For readability purposes, the representation has been condensed as follows: The cardinalities and qualifications are included in the figure.

    This document describes the usage of the following main entities for a correct usage of the Application Profile:
    | Agent | Catalogue | Catalogue Record | Catalogued Resource | Checksum | Data Service | Dataset | Dataset Series | Distribution | Kind | Licence Document | Location | Relationship | Table Group | Table | Column |

    The main entities are supported by:
    | Activity | Attribution | Checksum Algorithm | Concept | Concept Scheme | Document | Frequency | Geometry | Identifier | Legal Resource | Linguistic system | Literal | Media Type | Media Type or Extent | Period of Time | Policy | Provenance Statement | Resource | Rights statement | Role | Standard |

    And supported by these datatypes:
    | Temporal Literal | Time instant | xsd:dateTime | xsd:decimal | xsd:duration | xsd:hexBinary | xsd:nonNegativeInteger |

    HealthDCAT-AP UML Class Diagram for Non-public Health Data Access Level

    The following diagram illustrates the HealthDCAT-AP extension to DCAT-AP 3.0, specifically designed for non-public health data access levels within the European Health Data Space framework.

    HealthDCAT-AP UML Class Diagram for Non-public Health Data Access Level
    HealthDCAT-AP UML Class Diagram for Non-public Health Data Access Level
    HealthDCAT-AP UML Class Diagram for Restricted Health Data Access Level

    The following diagram illustrates the HealthDCAT-AP extension to DCAT-AP 3.0, specifically designed for restricted health data access levels.

    HealthDCAT-AP UML Class Diagram for Restricted Health Data Access Level
    HealthDCAT-AP UML Class Diagram for Restricted Health Data Access Level
    HealthDCAT-AP UML Class Diagram for Public Health Data Access Level

    The following diagram illustrates the HealthDCAT-AP extension to DCAT-AP 3.0, specifically designed for public health data access levels.

    HealthDCAT-AP UML Class Diagram for Public Health Data Access Level
    HealthDCAT-AP UML Class Diagram for Public Health Data Access Level

    Main Entities

    The main entities are those that form the core of the Application Profile. The properties and their associated constraints that apply in the context of this profile are listed in a tabular form. Each row corresponds to one property. In addition to the constraints also cross-references are provided to DCAT. To save space, the following abbreviations are used :

    This reuse qualification assessement is with respect to a specific version of HealthDCAT-AP. Therefore it may vary over time when new versions of HealthDCAT-AP are created.

    Agent

    Definition
    Any entity carrying out actions with respect to the entities Catalogue and the Catalogued Resources.
    Reference in DCAT
    Link
    Usage Note
    If the Agent is an organisation, the use of the Organization Ontology is recommended.
    Properties
    For this entity the following properties are defined: name , type .
    Property Range Card Definition Usage Reuse
    name Literal 1..* A name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages). A
    type Concept 0..1 The nature of the agent. A

    Health Data Access Body

    Definition
    Health Data Access Body supporting access to data in the Member State.
    Usage Note
    Health Data Access Body Bodies that are responsible for processing the health data applications and granting access to electronic health data for secondary use if the application is accepted. contant point is mandatory.
    Property Range Card Definition Usage Reuse
    name Literal 1..* A name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages). A
    type Concept 0..1 The nature of the agent. A
    contact point Kind 1 Contact information that can be used to contact the Agent. This property points to a contact point (Individual, Organization, Location, Group) that can answer questions about the dataset. Details on how to describe these are provided under class vcard:Kind. P

    Publisher

    Definition
    Agent responsible for making the health data resource available.
    Usage Note
    A Publisher is generally the agent responsible for making the health data resource available to the research and development community for use within applicable legal frameworks. It may be the health data holder or the organisation facilitating access on behalf of the data holder. In the EHDS context, this includes health data holders who directly publish data, intermediary organisations that facilitate access, and designated authorities like Health Data Access Bodies (HDABs) that manage access permissions and data distribution.
    Property Range Card Definition Usage Reuse
    contact point Kind 1 Contact information that can be used for sending comments about the Publisher. This property provides contact details for inquiries related to data access, usage rights, or technical support from the publisher. P
    type Concept 0..1 The nature or genre of the Publisher. This property indicates the category of the publisher, such as governmental body, academic institution, healthcare provider, or data intermediary. The NAL Health Publisher Types (EHDS) must be used. E
    description Literal 0..* A free-text description of the Publisher (Publisher Note). This property provides detailed information about the publisher's role, scope of data publishing activities, and relevant organizational details. P
    trusted data holder xsd:boolean 0..1 Indicates whether the Publisher is recognized as a trusted data holder under relevant health data governance frameworks. This property specifies if the publisher has been formally designated or certified as a trusted entity for handling and distributing health data under applicable regulatory frameworks. P

    Catalogue

    Definition
    A catalogue or repository that hosts the Datasets or Data Services being described.
    Reference in DCAT
    Link
    Properties
    For this entity the following properties are defined: applicable legislation , catalogue , creator , dataset , description , geographical coverage , has part , homepage , language , licence , modification date , publisher , record , release date , rights , service , temporal coverage , themes , title .
    Property Range Card Definition Usage Reuse
    applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Catalogue. The legislation that mandates the creation or management of the Catalogue. The ELI of the EHDS was published in March 2025 and can now be included as the applicable legislation, value ofr EHDS Regulation (http://data.europa.eu/eli/reg/2025/327/oj) if applicable As multiple legislations may apply to the resource the maximum cardinality is not limited. E
    catalogue Catalogue 0..* A catalogue whose contents are of interest in the context of this catalogue. For certain research projects, multiple catalogs may need to be organized in a nested manner. This property serves to connect the different catalogs with each other. A
    creator Agent 0..1 An entity responsible for the creation of the catalogue. The Agent how played the role of the catalgue creation. A
    dataset Dataset 0..* A Dataset that is part of the Catalogue. Each catalog contains one or more datasets. This property serves to link datasets to a catalogue. Therefore, a dataset is always contained inside a catalogue. A
    description Literal 1..* A free-text account of the Catalogue. Briefly describe the catalog and what it contains. You can repeat this in multiple languages. A
    geographical coverage Location 0..* A geographical area covered by the Catalogue. The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used. A
    has part Catalogue 0..* A related Catalogue that is part of the described Catalogue. Use this property to establish another catalogue in this catalogue. A related resource that is included either physically or logically in the described resource. A
    homepage Document 0..1 A web page that acts as the main page for the Catalogue. The home page of the catalogue, if available. A
    language Linguistic system 0..* A language used in the textual metadata describing titles, descriptions, etc. of the Datasets in the Catalogue. This property can be repeated if the metadata is provided in multiple languages. A
    licence Licence Document 0..1 A licence under which the Catalogue can be used or reused. The licence under which the catalogue is made available. A
    modification date Temporal Literal 0..1 The most recent date on which the Catalogue was modified. The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth Example: 2023-12-10T13:16:10.246Z. A
    publisher Agent 0..1 An entity (organisation) responsible for making the Catalogue available. If the publisher exists in the EU corporates bodies NAL the correponding entry should be used. A
    record Catalogue Record 0..* A Catalogue Record that is part of the Catalogue. Link to a CatalogRecord class when applicable. A
    release date Temporal Literal 0..1 The date of formal issuance (e.g., publication) of the Catalogue. The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth. Example: 2023-12-10T13:16:10.246Z. A
    rights Rights statement 0..* A statement that specifies rights associated with the Catalogue. A statement that concerns all rights not addressed in fields License or Rights, such as copyright statements. Everything that is not covered with licence. A
    service Data Service 0..* A site or end-point (Data Service) that is listed in the Catalogue. Some datasets may have real-time Data Services (e.g., Beacon API counting individuals). IT teams should define the relationship between the catalog and the Data Service via this property. A
    temporal coverage Period of Time 0..* A temporal period that the Catalogue covers. The start and end date of the period that the catalogue covers. This property makes use of a Period of Time class. This property makes use of the class dct:PeriodOfTime (more details on PeriodOfTime class tab ). The start and end of the interval SHOULD be given by using properties dcat:startDate or time:hasBeginning, and dcat:endDate or time:hasEnd, respectively. A
    themes Concept Scheme 0..* A knowledge organization system used to classify the Resources that are in the Catalogue. This property refers to a knowledge organization system used to classify the Catalogue's Datasets. It must have at least the value NAL:data-theme as this is the mandatory controlled vocabulary for dcat:theme. A
    title Literal 1..* A name given to the Catalogue. Provide a title(s) for your catalog, which can be repeated in multiple languages. Example: COVID 19 Study Catalogue. A

    Catalogue Record

    Definition
    A description of a Catalogued Resource's entry in the Catalogue.
    Reference in DCAT
    Link
    Properties
    For this entity the following properties are defined: application profile , change type , description , language , listing date , modification date , primary topic , source metadata , title .
    Property Range Card Definition Usage Reuse
    application profile Standard 0..* An Application Profile that the Catalogued Resource's metadata conforms to. This property identifies to what profile this Dataset complies to, it can be GeoDCAT-AP, HealthDCAT-AP, StatDCAT-AP and more. A
    change type Concept 0..1 The status of the catalogue record in the context of editorial flow of the dataset and data service descriptions. This property indicates the current status of the catalogue record as part of the editorial workflow governing the management and curation of dataset and data service descriptions. It may reflect stages such as "draft", "under review", "published", "deprecated", or "archived". A
    description Literal 0..* A free-text account of the record. This property can be repeated for parallel language versions of the description. Briefly describe the catalog record and what it contains. You can repeat this in multiple languages. A
    language Linguistic system 0..* A language used in the textual metadata describing titles, descriptions, etc. of the Catalogued Resource. This property can be repeated if the metadata is provided in multiple languages. A
    listing date Temporal Literal 0..1 The date on which the description of the Resource was included in the Catalogue. The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth Example: 2023-12-10T13:16:10.246Z. A
    modification date Temporal Literal 1 The most recent date on which the Catalogue entry was changed or modified. The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth Example: 2023-12-10T13:16:10.246Z. A
    primary topic Catalogued Resource 1 A link to the Dataset, Data service or Catalog described in the record. A catalogue record will refer to one entity in a catalogue. This can be either a Dataset or a Data Service. To ensure an unambigous reading of the cardinality the range is set to Catalogued Resource. However it is not the intent with this range to require the explicit use of the class Catalogued Record. As abstract class, an subclass should be used. A
    source metadata Catalogue Record 0..1 The original metadata that was used in creating metadata for the Dataset, Data Service or Dataset Series. This property identifies the original metadata (Catalogue Record) from where the metadata of this Catalogue Record was created from. A
    title Literal 0..* A name given to the Catalogue Record. This property can be repeated for parallel language versions of the name. A

    Catalogued Resource

    Definition
    Resource published or curated by a single agent.
    Reference in DCAT
    Link
    Usage Note
    This class Catalogued Resource is an abstract class for DCAT-AP. Therefore only subclasses should be used in a data exchange.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Checksum

    Definition
    A value that allows the contents of a file to be authenticated.
    Reference in DCAT
    Link
    Usage Note
    This class allows the results of a variety of checksum and cryptographic message digest algorithms to be represented.
    Properties
    For this entity the following properties are defined: algorithm , checksum value .
    Property Range Card Definition Usage Reuse
    algorithm Checksum Algorithm 1 Identifies the algorithm used to produce the subject Checksum. Choose one member of the checksum algorithm members from https://spdx.org/rdf/terms/#d4e2129 A
    checksum value xsd:hexBinary 1 The checksumValue property provides a lower case hexidecimal encoded digest value produced using a specific algorithm. A

    Data Service

    Definition
    A collection of operations that provides access to one or more datasets or data processing functions.
    Reference in DCAT
    Link
    Subclass of
    Catalogued Resource
    Properties
    For this entity the following properties are defined: access rights , applicable legislation , conforms to , contact point , description , documentation , endpoint description , endpoint URL , format , keyword , landing page , licence , publisher , serves dataset , theme , title .
    Property Range Card Definition Usage Reuse
    access rights Rights statement 0..1 Information regarding access or restrictions based on privacy, security, or other policies. The NAL access rights must be used. A
    applicable legislation Legal Resource 0..* The legislation that mandates the creation or management of the Data Service. The legislation that mandates the creation or management of the Daistribution. The ELI of the EHDS was published in March 2025 and can now be included as the applicable legislation, the value must include the ELI of the EHDS Regulation (http://data.europa.eu/eli/reg/2025/327/oj) if applicable. As multiple legislations may apply to the resource the maximum cardinality is not limited. A
    conforms to Standard 0..* An established (technical) standard to which the Data Service conforms. The standards referred here SHOULD describe the Data Service and not the data it serves. The latter is provided by the dataset with which this Data Service is connected. For instance the data service adheres to the OGC WFS API standard, while the associated dataset adheres to the INSPIRE Address data model. A
    contact point Kind 0..* Contact information that can be used for sending comments about the Data Service. This property informs about a contact point (Individual, Organization, Location, Group) that can answer questions about the DataService. Details on how to describe these are provided under class vcard:Kind. A
    description Literal 0..* A free-text account of the Data Service. Provide specific details about the data service series here, complementing the description of the related Dataset. This field can be repeated for different language versions of the description. A
    documentation Document 0..* A page or document about this Data Service A
    endpoint description Resource 0..* A description of the services available via the end-points, including their operations, parameters etc. The property gives specific details of the actual endpoint instances, while dct:conformsTo is used to indicate the general standard or specification that the endpoints implement. Provides technical documentation that explains how to access and interact with the data service's endpoint. A
    endpoint URL Resource 1..* The root location or primary endpoint of the service (an IRI). Provide the URL of the endpoint that users can interact with to access the data service. This should be a direct link to the service's endpoint, such as an API URL, SPARQL endpoint, or similar. A
    format Media Type or Extent 0..* The structure that can be returned by querying the endpointURL. This property can be used to describe a media format in more detail than "media type" when needed. Instances of this property should use a value from the file type NAL A
    keyword Literal 0..* A keyword or tag describing the Data Service. Add keywords to increase DataService discoverability. A
    landing page Document 0..* A web page that provides access to the Data Service and/or additional information. It is intended to point to a landing page at the original data service provider, not to a page on a site of a third party, such as an aggregator. A
    licence Licence Document 0..1 A licence under which the Data service is made available. The licence under which the data service is made available. A
    publisher Agent 0..1 An entity (organisation) responsible for making the Data Service available. The organization or individual responsible for making the data service available. In the context of data services, the publisher is typically the organization that manages or provides access to the service. For details, see the class Agent. A
    serves dataset Dataset 0..* This property refers to a collection of data that this data service can distribute. This property connects the Data Service class to its corresponding dataset(s), ensuring every data service links to at least one dcat:Dataset. While essential for metadata implementation teams on each node. A
    theme Concept 0..* A category of the Data Service. This property may use a controlled vocabulary. In the Health Data Catalogue, any entry from the controlled vocabulary can be used to describe the theme of the DataService. example: 'HEAL' (http://publications.europa.eu/resource/authority/data-theme/HEAL) . for a DataService which has the Health theme. A
    title Literal 1..* A name given to the Data Service. Provide a unique title for your Data service, which can be repeated in multiple languages. A

    Dataset

    Definition
    A conceptual entity that represents the information published.
    Reference in DCAT
    Link
    Usage Note
    If a Dataset is used as part of a Dataset Series, the usage of the properties listed below must be coherent with the associated Dataset Series. For this usage, consult the guidelines in section [[[#UsageGuidelines]]].
    Subclass of
    Catalogued Resource
    Properties
    For this entity the following properties are defined: access rights , alternative , analytics , applicable legislation , code values , coding system , conforms to , contact point , creator , dataset distribution , description , documentation , frequency , geographical coverage , has version , health category , health data access body , health theme , identifier , in series , is referenced by , keyword , landing page , language , legal basis , maximum typical age , minimum typical age , modification date , number of records , number of unique individuals , other identifier , personal data , population coverage , provenance , publisher , purpose , qualified attribution , qualified relation , quality annotation , related resource , release date , retention period , sample , source , spatial resolution , temporal coverage , temporal resolution , theme , title , type , version , version notes , was generated by .
    Select the access level to see appropriate cardinalities and requirements for health datasets.
    Property Range Card Definition Usage Reuse
    access rights Rights statement 1 Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public. E
    applicable legislation Legal Resource 1..n The legislation that mandates the creation or management of the Dataset. P P
    conforms to Standard 0..* An implementing rule or other specification. A
    contact point Kind 0..n Contact information that can be used for sending comments about the Dataset. A
    creator Agent 0..* An entity responsible for producing the dataset. A
    dataset distribution Distribution 0..* An available Distribution for the Dataset. A
    description Literal 1..* A free-text account of the Dataset. This property can be repeated for parallel language versions of the description. E
    documentation Document 0..* A page or document about this Dataset. P
    frequency Frequency 0..1 The frequency at which the Dataset is updated. E
    geographical coverage Location 0..* A geographic region that is covered by the Dataset. E
    has version Dataset 0..* A related Dataset that is a version, edition, or adaptation of the described Dataset. A
    alternative Literal 0..1 An alternative name for the dataset. This property provides an alternative title or name for the dataset.
    health category Concept 1..* The health category to which this dataset belongs as described in the Commission Regulation on the European Health Data Space laying down a list of categories of electronic data for secondary use, Art.51. A mandatory controlled vocabulary denoting health data within the scope of the Commission Regulation is provided.
    health data access body Agent 1..1 The health data access body responsible for providing access to this health dataset. Required by EHDS Regulation Article 77 for all health datasets.
    analytics Distribution 0..* Links to analytical resources or tools related to this dataset. Provides access to analytical capabilities or derived analytics for the dataset.
    code values Concept 0..* Code values used in the dataset. Provides information about coding systems and code values used in the health dataset.
    coding system Standard 0..* The coding system(s) used in this health dataset. Identifies medical coding systems (e.g., ICD-10, SNOMED CT) used in the dataset.
    health theme Concept 0..* Health-specific thematic category of the dataset. Provides health-specific categorization beyond general DCAT themes.
    number of records xsd:nonNegativeInteger 0..1 The number of records contained in this dataset. Provides quantitative information about dataset size for health data processing and analysis planning.
    retention period Period of time 0..1 The period for which the health data is retained. Important for GDPR compliance and health data governance requirements.
    maximum typical age xsd:non NegativeInteger 0..1 The maximum typical age of individuals represented in this health dataset. Helps users understand the age range of the population covered by the health data.
    minimum typical age xsd:non NegativeInteger 0..1 The minimum typical age of individuals represented in this health dataset. Helps users understand the age range of the population covered by the health data.
    number of unique individuals xsd:non NegativeInteger 0..1 The number of unique individuals represented in this health dataset. Important for privacy assessment and understanding the scope of the health data.
    personal data Personal Data 0..1 Indicates whether the dataset contains personal data as defined by GDPR. Critical for GDPR compliance and data protection impact assessments.
    population coverage Literal 0..* The population or demographic groups covered by this health dataset. Describes the specific populations represented in the health data for research and analysis purposes.
    identifier Literal 1..n The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the Catalogue. E
    in series Dataset Series 0..* A dataset series of which the dataset is part. E
    is referenced by Resource 0..* A related resource, such as a publication, that references, cites, or otherwise points to the dataset. A
    keyword Literal 0..* A keyword or tag describing the Dataset. A
    landing page Document 0..* A web page that provides access to the Dataset, its Distributions and/or additional information. It is intended to point to a landing page at the original data provider, not to a page on a site of a third party, such as an aggregator. A
    language Linguistic system 0..* A language of the Dataset. This property can be repeated if there are multiple languages in the Dataset. E
    modification date Temporal Literal 0..1 The most recent date on which the Dataset was changed or modified. E
    other identifier Identifier 0..* A secondary identifier of the Dataset Examples are MAST/ADS [[MASTADS]], DOI [[DOI]], EZID [[EZID]] or W3ID [[W3ID]]. E
    provenance Provenance Statement 0..* A statement about the lineage of a Dataset. P
    publisher Agent 0..1 An entity (organisation) responsible for making the Dataset available. E
    qualified attribution Attribution 0..* An Agent having some form of responsibility for the resource. A
    qualified relation Relationship 0..* A description of a relationship with another resource. A
    related resource Resource 0..* A related resource. A
    release date Temporal Literal 0..1 The date of formal issuance (e.g., publication) of the Dataset. E
    sample Distribution 0..* A sample distribution of the dataset. P
    source Dataset 0..* A related Dataset from which the described Dataset is derived. P
    spatial resolution xsd:decimal 0..* The minimum spatial separation resolvable in a dataset, measured in meters. A
    temporal coverage Period of time 0..* A temporal period that the Dataset covers. A
    temporal resolution xsd:duration 0..1 The minimum time period resolvable in the dataset. E
    theme Concept 0..* A category of the Dataset. A Dataset may be associated with multiple themes. E
    title Literal 1..* A name given to the Dataset. This property can be repeated for parallel language versions of the name. E
    type Concept 0..* A type of the Dataset. A recommended controlled vocabulary data-type is foreseen. E
    version Literal 0..1 The version indicator (name or identifier) of a resource. E
    version notes Literal 0..* A description of the differences between this version and a previous version of the Dataset. This property can be repeated for parallel language versions of the version notes. P
    was generated by Activity 0..* An activity that generated, or provides the business context for, the creation of the dataset. A
    purpose Purpose 0..* A category of data processing indicating the reason, objective or goal for which data is processed. RDF example: dpv:hasPurpose P
    legal basis Legal Basis 0..* The legal basis justifying the collection, holding and processing of personal data. RDF example: dpv:hasLegalBasis P
    quality annotation Quality Certificate 0..* The quality annotation(s) associated with a dataset. RDF example: dqv:hasQualityAnnotation P

    Dataset Series

    Definition
    A collection of datasets that are published separately, but share some characteristics that group them.
    Reference in DCAT
    Link
    Usage Note
    It is recommended to avoid Dataset Series without a dataset in the collection. Therefore at least one Dataset should refer to a Dataset Series using the property in series (dcat:inSeries).
    Subclass of
    Catalogued Resource
    Properties
    For this entity the following properties are defined: applicable legislation , contact point , description , frequency , geographical coverage , modification date , publisher , release date , temporal coverage , title .
    Property Range Card Definition Usage Reuse
    applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Dataset Series. The legislation that mandates the creation or management of the Dataset Series. The ELI of the EHDS was published in March 2025 and can now be included as the applicable legislation, value ofr EHDS Regulation (http://data.europa.eu/eli/reg/2025/327/oj) if applicable As multiple legislations may apply to the resource the maximum cardinality is not limited. E
    contact point Kind 0..* Contact information that can be used for sending comments about the Dataset Series. This property informs about a contact point (Individual, Organization, Location, Group) that can answer questions about the Dataset Series. Details on how to describe these are provided under class vcard:Kind. A
    description Literal 1..* A free-text account of the Dataset Series. This property can be repeated for parallel language versions. It is recommended to provide an indication about the dimensions the Dataset Series evolves. A
    frequency Frequency 0..1 The frequency at which the Dataset Series is updated. The frequency of a dataset series is not equal to the frequency of the dataset in the collection. A
    geographical coverage Location 0..* A geographic region that is covered by the Dataset Series. When spatial coverage is a dimension in the dataset series then the spatial coverage of each dataset in the collection should be part of the spatial coverage. In that case, an open ended value is recommended, e.g. EU or a broad bounding box covering the expected values. A
    modification date Temporal Literal 0..1 The most recent date on which the Dataset Series was changed or modified. This is not equal to the most recent modified dataset in the collection of the dataset series. The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth Example: 2023-12-10T13:16:10.246Z. A
    publisher Agent 0..1 An entity (organisation) responsible for ensuring the coherency of the Dataset Series  The publisher of the dataset series may not be the publisher of all datasets.  E.g. a digital archive could take over the publishing of older datasets in the series.  A
    release date Temporal Literal 0..1 The date of formal issuance (e.g., publication) of the Dataset Series. The moment when the dataset series was established as a managed resource. This is not equal to the release date of the oldest dataset in the collection of the dataset series. The values must be data typed as either xsd:date, xsd:dateTime, xsd:gYear or xsd:gYearMonth Example: 2023-12-10T13:16:10.246Z. A
    temporal coverage Period of Time 0..* A temporal period that the Dataset Series covers. When temporal coverage is a dimension in the dataset series then the temporal coverage of each dataset in the collection should be part of the temporal coverage. In that case, an open ended value is recommended, e.g. after 2012. A
    title Literal 1..* A name given to the Dataset Series. Provide a unique title for your Dataset Series, which can be repeated in multiple languages. A

    Distribution

    Definition
    A physical embodiment of the Dataset in a particular format.
    Reference in DCAT
    Link
    Properties
    For this entity the following properties are defined: access service , access URL , applicable legislation , availability , byte size , checksum , compression format , description , documentation , download URL , format , has policy , language , licence , linked schemas , media type , modification date , packaging format , release date , rights , spatial resolution , status , temporal resolution , title .
    Property Range Card Definition Usage Reuse
    access service Data Service 0..* A data service that gives access to the distribution of the dataset. Indicate the specific data service through which this distribution can be accessed programmatically. A
    access URL Resource 1..* A URL that gives access to a Distribution of the Dataset. Provide the direct access point URL where users can retrieve or interact with this specific distribution of the dataset. A
    applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Distribution. Reference the specific legal instruments that require or govern the creation, management, and provision of this distribution, particularly the European Health Data Space Regulation. E
    availability Concept 0..1 An indication how long it is planned to keep the Distribution of the Dataset available. Specify the planned duration of availability using controlled vocabulary terms that indicate retention policies and access guarantees. A
    byte size xsd:nonNegativeInteger 0..1 The size of a Distribution in bytes. Provide the exact file size in bytes to help users understand storage requirements and download expectations. A
    checksum Checksum 0..1 A mechanism that can be used to verify that the contents of a distribution have not changed. Provide cryptographic hash values (e.g., MD5, SHA-256) to enable verification of file integrity and detect any unauthorized modifications. A
    compression format Media Type 0..1 The format of the file in which the data is contained in a compressed form, e.g. to reduce the size of the downloadable file. Specify compression algorithms used (e.g., gzip, zip, bzip2) using IANA media types to inform users about decompression requirements. A
    description Literal 0..* A free-text account of the Distribution. Provide detailed information about the distribution's content, format characteristics, intended use cases, and any special considerations for health data usage. A
    documentation Document 0..* A page or document about this Distribution. Reference technical documentation, data dictionaries, schema files, or usage guides that help users understand and properly utilize this distribution. A
    download URL Resource 0..* A URL that is a direct link to a downloadable file in a given format. Provide direct download links that allow immediate file retrieval without requiring navigation through intermediate pages or authentication processes. A
    format Media Type or Extent 0..1 The file format of the Distribution. Specify the technical format using standardized media types (e.g., CSV, JSON, XML, FHIR) to indicate data structure and required processing tools. A
    has policy Policy 0..1 The policy expressing the rights associated with the distribution if using the [[ODRL]] vocabulary. Define machine-readable usage policies using ODRL vocabulary to specify permissions, prohibitions, and duties related to health data access and use. A
    language Linguistic system 0..* A language used in the Distribution. Specify the natural languages used in textual content within the distribution using ISO 639 language codes for international interoperability. A
    licence Licence Document 0..1 A licence under which the Distribution is made available. Reference the legal terms and conditions governing access, use, and redistribution of this distribution, ensuring compliance with health data regulations. A
    linked schemas Standard 0..* An established schema to which the described Distribution conforms. Reference technical standards, data models, or schema specifications (e.g., HL7 FHIR, OMOP CDM) that define the structure and semantics of this distribution. A
    media type Media Type 0..1 The media type of the Distribution as defined in the official register of media types managed by IANA. Specify the precise IANA media type (e.g., application/json, text/csv, application/fhir+json) to enable proper content handling by consuming applications. A
    modification date Temporal Literal 0..1 The most recent date on which the Distribution was changed or modified. Record the timestamp when the distribution was last updated to help users assess data currency and track version history. A
    packaging format Media Type 0..1 The format of the file in which one or more data files are grouped together, e.g. to enable a set of related files to be downloaded together. Specify container formats (e.g., tar, zip, 7z) used to bundle multiple files together, using IANA media types for format identification. A
    release date Temporal Literal 0..1 The date of formal issuance (e.g., publication) of the Distribution. Document when this distribution was first published or made available to establish temporal context and version tracking. A
    rights Rights statement 0..* A statement that specifies rights associated with the Distribution. Detail intellectual property rights, usage restrictions, and access permissions governing this distribution, complementing license information. A
    spatial resolution xsd:decimal 0..1 The minimum spatial separation resolvable in a dataset distribution, measured in meters. Specify the geographical precision in meters for spatially-referenced health data (e.g., postal code level, administrative region level). A
    status Concept 0..1 The status of the distribution in the context of maturity lifecycle. Indicate the lifecycle stage using controlled vocabulary (Completed, Deprecated, Under Development, Withdrawn) to inform users about distribution reliability and future availability. A
    temporal resolution xsd:duration 0..1 The minimum time period resolvable in the dataset distribution. Specify the finest temporal granularity of data points using ISO 8601 duration format (e.g., daily, monthly, yearly data collection intervals). A
    title Literal 0..* A name given to the Distribution. Provide a descriptive name that clearly identifies this specific distribution format, version, or subset to distinguish it from other distributions of the same dataset. A

    Kind

    Definition
    A description following the vCard specification.
    Usage Note
    Note that the class Kind is the parent class for the four explicit types of vCard (Individual, Organization, Location, Group). It is mandatory to provide at least either an email or a contact form from e.g. a service desk.
    Properties
    For this entity the following properties are defined: contact page , email
    Property Range Card Definition Usage Reuse
    contact page Resource 0..1 A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact. It is recommended to provide at least either an email or a contact form from e.g. a service desk. A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact. P
    email Resource 0..1 A email address via which contact can be made. It is recommended to provide at least either an email or a contact form from e.g. a service desk. When naming a contact point this information needs to be further specified with additional information, i.e., an email address. This email address does not need to be a direct contact to the person responsible for the management of the data, it could be a generic information email. The email address has to be provided starting with mailto: prefix. For example: mailto:info@example.com / mailto: jane.doe@example.com P

    Licence Document

    Definition
    A legal document giving official permission to do something with a resource.
    Properties
    For this entity the following properties are defined: type .
    Property Range Card Definition Usage Reuse
    type Concept 0..* A type of licence, e.g. indicating 'public domain' or 'royalties required'. A

    Location

    Definition
    A spatial region or named place.
    Reference in DCAT
    Link
    Usage Note
    It can be represented using a controlled vocabulary or with geographic coordinates. In the latter case, the use of the Core Location Vocabulary is recommended, following the approach described in the GeoDCAT-AP specification.
    Properties
    For this entity the following properties are defined: bbox , centroid , geometry .
    Property Range Card Definition Usage Reuse
    bbox Literal 0..1 The geographic bounding box of a resource. A
    centroid Literal 0..1 The geographic center (centroid) of a resource. A
    geometry Geometry 0..1 The corresponding geometry for a resource. A

    Relationship

    Definition
    An association class for attaching additional information to a relationship between DCAT Resources.
    Reference in DCAT
    Link
    Properties
    For this entity the following properties are defined: had role , relation .
    Property Range Card Definition Usage Reuse
    had role Role 1..n A function of an entity or agent with respect to another entity or resource. A
    relation Resource 1..n A resource related to the source resource. A

    CSVW (Variables)

    The CSVW (CSV on the Web) specification provides a framework for describing tabular data and its metadata. In the context of HealthDCAT-AP, CSVW is used to define variable dictionaries that describe the structure and meaning of synthetic, anonymized health data samples.

    This specification defines the dictionary of variables that will be linked via the dcat:downloadURL of the sample distribution. The download URL provides a synthetic, anonymized example of the real data, and the dictionary describes the variables used in that example. The CSVW vocabulary enables detailed documentation of each variable including its technical name, human-readable title, description, datatype, and semantic relationships.

    The following three entities work together to provide comprehensive variable documentation:

    Table Group

    Definition
    This specification defines the dictionary of variables that will be linked via the dcat:downloadURL of the sample distribution. The download URL provides a synthetic, anonymized example of the real data, and the dictionary describes the variables used in that example.
    Reference in CSVW
    Link
    Properties
    For this entity the following properties are defined: table
    Property Range Card Definition Usage Reuse
    table Table 1..n A table is a single CSV file within a CSVW Table Group. Used to associate one or more tables with a table group. It holds all variable dictionary entries, using columns describing each variable. P

    Table

    Definition
    A table within a CSVW Table Group that represents a single CSV file with variable definitions.
    Reference in CSVW
    Link
    Properties
    For this entity the following properties are defined: url , title , keyword , column
    Property Range Card Definition Usage Reuse
    url IRI 0..1 The URL of the CSV file to which this table refers. The URL should be a valid IRI and should point to the CSV file (synthetic, anonymized representation of the real data) for this table. P
    title Literal 1..n Table title. Used to provide a human-readable title for the table. Can appear in multiple languages. P
    column Column 1..n Variable (column) defines metadata about one field in the CSV file. Each variable (column) node defines characteristics such as name, title, datatype, etc., for one column of the CSV. Should be either a blank node or IRI referring to a csvw:Column. P

    Column

    Definition
    A column within a CSV table that represents a variable with its metadata and characteristics.
    Reference in CSVW
    Link
    Properties
    For this entity the following properties are defined: name , title , description , datatype , property URL
    Property Range Card Definition Usage Reuse
    name Literal 1..n A Technical name given to the Variable as it appears in the CSV file. Technical variable name required to match a column header in the CSV file (synthetic, anonymized representation of the real data), can appear in multiple languages. P
    title Literal 1..n A Human-readable name given to the Variable. Human-readable title(s) for the column. May be multilingual. P
    description Literal 1..n Variable description. Used to provide human-readable information about the variable (column). Can appear in multiple languages. This is especially helpful for variable documentation. P
    datatype Literal 1 Specifies the expected datatype of the values in the column. Should match the label of one of the CSVW types as defined in the Built-in datatypes from XML Schema. https://w3c.github.io/csvw/metadata/#datatypes Example: string, integer, boolean ... P
    property URL IRI 0..1 A URI template used to define the RDF predicate associated with the values in the column, relative to each row. The propertyUrl may point to standard, controlled vocabularies, or ontologies that define the semantic meaning of the column. For example: csvw:propertyUrl <http://publications.europa.eu/resource/authority/country> For advanced usage, refer to the official specification: https://www.w3.org/TR/tabular-metadata/#propertyurl P

    Supportive Entities

    The supportive entities are supporting the main entities in the Application Profile. They are included in the Application Profile because they form the range of properties.

    Activity

    Definition
    An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Attribution

    Definition
    Attribution is the ascribing of an entity to an agent.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Checksum Algorithm

    Definition
    Algorithm for Checksums.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Concept

    Definition
    An idea or notion; a unit of thought.
    Usage Note
    In HealthDCAT-AP, a Concept is used to denote codes within a codelist. In section [[[#controlled-vocs]]] the expectations are elaborated in more detail.
    Properties
    For this entity the following properties are defined: preferred label .
    Property Range Card Definition Usage Reuse
    preferred label Literal 1..n A preferred label of the concept. This property can be repeated for parallel language versions of the label. A

    Concept Scheme

    Definition
    An aggregation of one or more SKOS concepts.
    Usage Note
    In HealthDCAT-AP, a Concept Scheme is used to denote a codelist. In [[[#controlled-vocs]]]. Controlled Vocabularies the expectations are elaborated in more detail. the user must provide either : dct:title, rdfs:label or skos:prefLabel.
    Properties
    For this entity the following properties are defined: title, preferred label, label.
    Property Range Card Definition Usage Reuse
    title Literal 0..* A name of the concept scheme. May be repeated for different versions of the name A
    preferred label Literal 0..* A preferred label of the concept scheme. May be repeated for different versions of the preferred label P
    label Literal 0..* A label of the concept scheme. May be repeated for different versions of the label P

    Document

    Definition
    A textual resource intended for human consumption that contains information, e.g. a web page about a Dataset.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Frequency

    Definition
    A rate at which something recurs, e.g. the publication of a Dataset.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Geometry

    Definition
    The locn:Geometry class provides the means to identify a location as a point, line, polygon, etc. expressed using coordinates in some coordinate reference system.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Identifier

    Definition
    This is based on the UN/CEFACT Identifier class.
    Usage Note
    An identifier in a particular context, consisting of the
    • content string that is the identifier;
    • an optional identifier for the identifier scheme;
    • an optional identifier for the version of the identifier scheme;
    • an optional identifier for the agency that manages the identifier scheme.
    Properties
    For this entity the following properties are defined: notation .
    Property Range Card Definition Usage Reuse
    notation Literal 1 A string that is an identifier in the context of the identifier scheme referenced by its datatype. A

    Legal Resource

    Definition
    This class represents the legislation,policy or policies that lie behind the Rules that govern the service.
    Usage Note
    The definition and properties of the Legal Resource class are aligned with the ontology included in "Council conclusions inviting the introduction of the European Legislation Identifier (ELI)". For describing the attributes of a Legal Resource (labels, preferred labels, alternative labels, definition, etc.) we refer to the (ELI) ontology. In this data specification the use is restricted to instances of this class that follow the (ELI) URI guidelines.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Linguistic system

    Definition
    A system of signs, symbols, sounds, gestures, or rules used in communication, e.g. a language.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Legal Basis

    Definition
    Legal basis used to justify processing of data or use of technology in accordance with a law.
    Usage Note
    Legal basis (plural: legal bases) are defined by legislations and regulations, whose applicability is usually restricted to specific jurisdictions which can be represented using dpv:hasJurisdiction or dpv:hasLaw. Legal basis can be used without such declarations, e.g. 'Consent', however their interpretation will require association with a law, e.g. 'EU GDPR'.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Literal

    Definition
    A literal value such as a string or integer; Literals may be typed, e.g. as a date according to xsd:date. Literals that contain human-readable text have an optional language tag as defined by BCP 47 [[rfc5646]].
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Media Type

    Definition
    A file format or physical medium.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Media Type or Extent

    Definition
    A media type or extent.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Period of Time

    Definition
    An interval of time that is named or defined by its start and end dates.
    Reference in DCAT
    Link
    Properties
    For this entity the following properties are defined: beginning , end , end date , start date .
    Property Range Card Definition Usage Reuse
    beginning Time instant 0..1 The beginning of a period or interval. A
    end Time instant 0..1 The end of a period or interval. A
    end date Temporal Literal 0..1 The end of the period. A
    start date Temporal Literal 0..1 The start of the period. A

    Personal data

    Definition
    Data directly or indirectly associated or related to an individual.
    Usage Note
    This definition of personal data encompasses the concepts used in GDPR Art.4-1 for 'personal data' and ISO/IEC 2700 for 'personally identifiable information (PII)'.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Policy

    Definition
    A non-empty group of Permissions and/or Prohibitions.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Provenance Statement

    Definition
    A statement of any changes in ownership and custody of a resource since its creation that are significant for its authenticity, integrity, and interpretation.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Purpose

    Definition
    This class represents the Purpose or Goal of processing data or using technology.
    Usage Note
    The purpose or goal here is intended to sufficiently describe the intention or objective of why the data or technology is being used, and should be broader than mere technical descriptions of achieving a capability. For example, "Analyse Data" is an abstract purpose with no indication of what the analyses is for as compared to a purpose such as "Marketing" or "Service Provision" which provide clarity and comprehension of the 'purpose' and can be enhanced with additional descriptions.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Quality certificate

    Definition
    An annotation that associates a resource (especially, a dataset or a distribution) to another resource (for example, a document) that certifies the resource's quality according to a set of quality assessment rules.
    Usage Note
    Instances of this class are annotations pointing to quality certificates.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Resource

    Definition
    Anything described by RDF.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Rights statement

    Definition
    A statement about the intellectual property rights (IPR) held in or over a resource, a legal document giving official permission to do something with a resource, or a statement about access rights.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Role

    Definition
    A role is the function of a resource or agent with respect to another resource, in the context of resource attribution or resource relationships.
    Reference in DCAT
    Link
    Usage Note
    Note it is a subclass of skos:Concept.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Standard

    Definition
    A standard or other specification to which a resource conforms.
    Properties
    This specification does not impose any additional requirements to properties for this entity.

    Datatypes

    The following datatypes are used within this specification.
    Class Definition
    rdfs:Literal encoded using the relevant [[ISO8601]] Date and Time compliant string and typed using the appropriate XML Schema datatype (xsd:gYear, xsd:gYearMonth, xsd:date, or xsd:dateTime).
    A temporal entity with zero extent or duration.
    Boolean has the ·value space· required to support the mathematical concept of binary-valued logic: {true, false}.
    Object with integer-valued year, month, day, hour and minute properties, a decimal-valued second property, and a boolean timezoned property.
    Decimal represents a subset of the real numbers, which can be represented by decimal numerals. The ·value space· of decimal is the set of numbers that can be obtained by multiplying an integer by a non-positive power of ten, i.e., expressible as i × 10^-n where i and n are integers and n >= 0.
    Duration represents a duration of time. The ·value space· of duration is a six-dimensional space where the coordinates designate the Gregorian year, month, day, hour, minute, and second components defined in § 5.5.3.2 of [[ISO8601]], respectively.
    Hex-encoded binary data. The ·value space· of hexBinary is the set of finite-length sequences of binary octets.
    Number derived from integer by setting the value of minInclusive to be 0.

    Controlled Vocabularies

    Requirements for controlled vocabularies

    The following is a list of requirements that were identified for the controlled vocabularies to be recommended in this Application Profile. Controlled vocabularies SHOULD: These criteria do not intend to define a set of requirements for controlled vocabularies in general; they are only intended to be used for the selection of the controlled vocabularies that are proposed for this Application Profile.

    Expected usage of controlled vocabularies

    To increase the interoperability, the value spaces of properties can be further harmonised using shared controlled vocabularies. This kind of restriction may be subject to a varying interpretation on what the expected usage of a controlled vocabulary is. To ensure a common interpretation the following expectations are defined:

    The first two (and preferably also the others) SHOULD only be used for requirements that can be verified by a machine. If the validation can only be realised with the involvement of humans, then a less strong requirement (RECOMMMENDED or MAY) is used, even if the intention is to be very strict. This is to ensure that the provided SHACL representations correspond closely to all use cases possible. Stronger enforcements are left for implementations as they have control on actual data exchange.

    Controlled vocabularies to be used

    The tables below show the usage of the codelists with the expected qualification.

    Properties with controlled vocabularies that MUST be used for the listed properties

    Property URIUsed for ClassVocabulary nameUsage note
    healthdcatap:healthCategoryDatasetHealth Categories (EHDS Art. 51) NALThe list of terms for the health category to which this dataset belongs as described in the Commission Regulation on the European Health Data Space laying down a list of categories of electronic data for secondary use, Art. 51
    healthdcatap:healthThemeDatasetHealth Theme NALThe list of terms for health themes that apply to the dataset.
    dct:typeAgent (Publisher)Health Publisher Types (EHDS)The list of terms for the Publisher Types based on THEDAS feedback.
    prov:wasGeneratedByDatasetHealth Activity NALThe list of terms for the activity type to indicate the dataset production context at various levels of granularity.
    dct:accrualPeriodicityDatasetEU Vocabularies Frequency Named Authority List
    dct:languageCatalogue, Dataset, DistributionEU Vocabularies Languages Named Authority List
    dct:formatDistribution,Data Service EU Vocabularies File Type Named Authority List
    adms:statusDistributionEU Vocabularies Distribution Status
    dcatap:availabilityDistributionDistribution availability vocabularyThe list of terms for the avalability levels of a dataset distribution in the DCAT-AP specification.
    dct:accessRightsDataset, Data ServiceAccess Rights Named Authority ListUse one of the following values (:PUBLIC, :RESTRICTED, :NON_PUBLIC).
    dcat:themeDatasetDataset Theme VocabularyThe values to be used for this property are the URIs of the concepts in the vocabulary.
    dcat:mediaTypeDistributionIANA Media Types
    spdx:algorithmChecksumChecksum algorithm membersThe members listed are considered a controlled vocabulary of supported checksum algorithms.
    compression formatDistributionIANA Media TypesThe members listed are considered a controlled vocabulary of supported compression formats.
    packaging formatDistributionIANA Media TypesThe members listed are considered a controlled vocabulary of supported compression formats.

    * For IANA media-type IRIs, SHACL enforcement is implemented via regex pattern (warning severity) due to lack of an official RDF list; the constraint remains normative “MUST”.

    Properties with AT LEAST 1 value from the controlled vocabularies

    Property URIUsed for ClassVocabulary nameUsage note
    dcat:themeTaxonomyCatalogueEU Vocabularies Data Theme Named Authority ListMultiple taxonomies may be listed, but at least the EU Data Theme (http://publications.europa.eu/resource/authority/data-theme) must be present.

    Properties with a RECOMMENDED use of controlled vocabularies

    Property URIUsed for ClassVocabulary nameUsage note
    dct:typeDatasetDataset-type authority tableThis list of terms provide types of datasets. Its main scope is to support dataset categorisation of the EU Open Data Portal.
    dct:typeLicence DocumentADMS licence type vocabulary The list of terms in the ADMS licence type vocabulary is included in the ADMS specification

    Properties with MAY use a controlled vocabularies

    Property URIUsed for ClassVocabulary nameUsage note
    dct:publisherCatalogueEU Vocabularies Corporate bodies Named Authority ListThe Corporate bodies NAL must be used for European institutions and a small set of international organisations. In case of other types of organisations, national, regional or local vocabularies should be used.
    dct:spatialCatalogue,DatasetEU Vocabularies Continents Named Authority List, EU Vocabularies Countries Named Authority List, EU Vocabularies Places Named Authority List, Geonames The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used.

    Other controlled vocabularies

    In addition to the proposed common controlled vocabularies, which are mandatory to ensure minimal interoperability, implementers are encouraged to publish and to use further region or domain-specific vocabularies that are available online. While those may not be recognised by general implementations of the Application Profile, they may serve to increase interoperability across applications in the same region or domain. Examples are the full set of concepts in EuroVoc, the CERIF standard vocabularies, the Dewey Decimal Classification and numerous other schemes.

    Agent Roles

    The first version of DCAT Application Profile [[vocab-dcat-1]] had a single property to relate an Agent (typically, an organisation) to a Dataset. The only such ‘agent role’ that could be expressed in that version of the profile is through the property publisher, defined as “An entity responsible for making the dataset available”. A second property is available in that DCAT recommendation [[vocab-dcat-1]], contact point, defined as “Link a dataset to relevant contact information which is provided using vCard”, but this is not an agent role as the value of this property is contact data, rather than a representation of the organisation as such. In specific cases, for example in exchanging data among domain-specific portals, it may be useful to express other, more specific agent roles. In such cases, extensions to DCAT-AP may be defined using additional properties with more specific meanings.

    Two possible approaches have been discussed, particular in the context of the development of the domain-specific GeoDCAT Application Profile [[geodcat-ap]]. The first possible approach is based on the use of a predicate vocabulary that provides a set of properties that represent additional types of relationships between Datasets and Agents. For example, properties could be defined, such as foo:owner, foo:curator or foo:responsibleParty, in addition to the use of existing well-known properties, such as dct:creator and dct:rightsHolder. A possible source for such additional properties is the Roles Named Authority List maintained by the Publications Office of the EU. Other domain-specific sources for additional properties are the INSPIRE Responsible Party roles ,the Library of Congress’ MARC relators and DataCite’s contributor types. To enable the use of such properties, they must be defined as RDF properties with URIs in a well-managed namespace. GeoDCAT-AP has introduced a number of specific properties.

    A second approach is based on the use of W3C’s PROV ontology [[prov-o]] which provides a powerful mechanism to express a set of classes, properties, and restrictions that can be used to represent and interchange provenance information generated in different systems and under different contexts. In the context of work on GeoDCAT-AP, a PROV-conformant solution for expressing agent roles was agreed . This solution uses prov:qualifiedAttribution in combination with a dct:type assertion pointing to the code list for Responsible Party Role in the INSPIRE registry. To enable the use of such types, they must be defined with URIs in a well-managed namespace.

    Based on the experience gained with the use of domain-specific extensions for additional ‘agent roles’ in the exchange of information about Datasets and on the requests of implementors and stakeholders, the DCAT Application Profile release 2.0.0 is extended with additional roles as proposed by DCAT Version 2 [[vocab-dcat-2]] that have proven to be useful across domains. Precisely, properties creator, qualified attribution and qualified relation have been added to Dataset class to further facilitate relationships between datasets and agents.

    In the most recent DCAT Version 3 [[vocab-dcat-3]] a dedicated section on the relationship with Agents is provided. The DCAT-AP guidelines for Agents Roles are conformant to this.

    As a technical note: While both approaches represent equivalent effort for publishers of datasets and editors of the specifications overall, their governance models differ. For the first approach using direct properties, the relationship definition occurs within the profile/extension. The governance is external for the second approach using codelists. This may lead to unwanted overhead, e.g. to manage mappings between the distinct codelists different profiles may use. The impact between both is primary operational on querying the data catalogue. Direct properties support simple queries. Codelists necessitate complex queries, often extended with dynamic mapping logic between the external lists.

    It should be noted that, even if the second approach is used in a particular implementation, the provision of information using dct:publisher for the Catalogue is still mandatory under the rules laid down in the Conformance Statement in section [[[#conformance]]], while the provision of information using dct:publisher is strongly recommended for Dataset. The provision of such information using dct:publisher will ensure interoperability with implementations that use the basic approach of DCAT-AP. As long the first approach is actively be used, the second approach is thus limited for use to a profile specific context. For that reason, it is recommended to use the direct properties including the ones defined by GeoDCAT-AP such as custodian, distributor, originator, principalInvestigator, processor, resourceProvider in favor of the second approach using the prov:qualifiedAttribution.

    Accessibility and Multilingual Aspects

    Accessibility in the context of this Application Profile is limited to information about the technical format of distributions of datasets. The properties dcat:mediaType and dct:format provide information that can be used to determine what software can be deployed to process the data. The accessibility of the data within the datasets needs to be taken care of by the software that processes the data and is outside of the scope of this Application Profile.

    Multilingual aspects related to this Application Profile concern all properties whose contents are expressed as strings (i.e. rdfs:Literal) with human-readable text. Wherever such properties are used, the string values are of one of two types:

    Wherever values of properties are expressed with either type of string, the property can be repeated with translations in the case of free text and with parallel versions in case of named entities. For free text, e.g. in the cases of titles, descriptions and keywords, the language tag is mandatory.

    Language tags to be used with rdfs:Literal are defined by BCP47 [[rfc5646]], which allows the use of the "t" extension for text transformations defined in RFC6497 [[rfc6497]] with the field "t0" indicating a machine translation.

    A language tag will look like: "en-t-es-t0-abcd", which conveys the information that the string is in English, translated from Spanish by machine translation using a tool named "abcd".

    For named entities, the language tag is optional and should only be provided if the parallel version of the name is strictly associated with a particular language. For example, the name ‘European Union’ has parallel versions in all official languages of the union, while a name like ‘W3C’ is not associated with a particular language and has no parallel versions.

    For linking to different language versions of associated web pages (e.g. landing pages) or documentation, a content negotiation mechanism may be used whereby different content is served based on the Accept-Languages indicated by the browser. Using such a mechanism, the link to the page or document can resolve to different language versions of the page or document.

    All the occurrences of the property dct:language, which can be repeated if the metadata is provided in multiple languages, MUST have a URI [[rfc3986]] as their object, not a literal string from the ISO 639 code list.

    How multilingual information is handled in systems, for example in indexing and user interfaces, is outside of the scope of this Application Profile.

    General usage guidelines

    Usage guide on Datasets, Distributions and Data Services

    The introduction of Data Services as first class citizens in DCAT 2.0 raised questions about the usage of Data Services and Distributions. This section provides a guideline for publishers what to consider as a Distribution and what as a Data Service.

    A first distinction between distributions and data services is their dependency on a dataset for their existence. A distribution cannot exist without its dataset. It is a specific representation of a dataset (cfr definition W3C Distribution). Whereas a data service is an entity in its own right. It provides access to datasets or it provides data processing functions. The independence also holds between the distributions of a dataset, and the data service which provides access to that dataset. The distributions are not required to be the result of the data service operations. However, they may.

    Many of the properties of distributions are file oriented (downloadURL, format, byte size, checksum, modification date, ...). The relevance of this information is reduced for data services, related information is present in a very different form and thus under different terminology. For instance, data services do and can provide format transformations, language transformations and schema transformations on request. Also the handling of trust is different. While tampering of downloadable content is detected by e.g. checksums, data services create often a trusted channel using security measures such as authentication and encryption. This reduces the need for additional trust checks on the data.

    The difference between downloading a file or accessing the data through a service have resulted in the following guidelines:

    Orthogonal to the nature clarification of distributions and data services, there might be need for a granularity clarification between datasets and distributions. Commonly, at first sight, it is expected that all distributions of a dataset are identical in content, only differing in the representation of the data. But when considering dataset series, this interpretation seems not valid anymore. In the upcoming release of DCAT 3.0, dataset series and dataset versioning are addressed. Implementors are advised to already take this proposal into account when creating guidelines for distributions. Note that this is less an issue between datasets and data services as both are independent entities. Data services usually address the granularity by providing the necessary query interface language so that the user can get the data according its needs.

    These guidelines will be able to capture many access patterns, corresponding to most users' expectations. However there might be cases that are more vague. In that case the DCAT(-AP) community can be questioned for a recommended approach.

    Usage guide on Dataset Series

    Dataset Series can be considered as message from the publisher that the data of a dataset evolves according to one or more dimensions and that this evolution is available via a collection of independent, yet closely related, datasets.

    The need for sharing this grouping explicitly is strongly use case dependent, and therefore as this will require additional metadata management effort by the publisher, the use of Dataset Series is optional. It should fit the objectives. For instance, if a publisher is sharing an active updated dataset accessible via an API, that provides current as historic data, then it is not mandatory to created metadata records for each snapshot per year. Only if these snapshots are created intentionally and the publisher wants to share the life cycle of them with the public, then Dataset Series come into the picture.

    In order to harmonise the use of Dataset Series, the following guidelines are to be considered:

    In general it is expected that the members of Dataset Series are strongly connected. However, there are no common criteria or rules how this connection could be determined. Usually, the shared characteristics are expressed as a data domain, e.g. the population of bees, and some evolution in space and time, e.g. in Greece for the period of 2019-2023, and published by a single publisher. Nevertheless, other characteristics could give raise to the creation of a Dataset Series.

    The DCAT-AP working group has investigated to find and express unique characteristics of Datasets that are members of a Dataset Series. Over time, during the exchanges it became clear that today no consensus exists on restricting the use of Dataset Series to a more limited use. Therefore the DCAT-AP working group has decided to retract the notion of a Dataset Member of a Dataset Series (a subclass of DCAT-AP Datasets). Profile builders may reintroduce this notion when they want to express specific constraints for that usage scope.

    In case the connection is very strong or the result of an automated process, versioning terminology is a natural way to express the connection between the Datasets in the Dataset Series collection. For guidelines on expressing versioning information in DCAT-AP, consult the DCAT guidelines on versioning. As versioning is not always the most appropriate terminology, DCAT introduces properties (e.g. next, previous, inSeries, last, etc.) to interconnect the Dataset Series with its members and among the members themselves. It is recommended to use always these properties in combination with Dataset Series, and versioning when appropriate.

    The membership in a Dataset Series may influence the (descriptive) metadata to highlight differences from other Datasets in the collection. For instance, a usual adaption is the addition of the release data in the title.

    Support for implementation

    Implementing the DCAT-AP data specification in a data exchange between two systems raises also technical questions.

    Identifiers

    For implementers, the requirement to being able to exchange data as RDF impacts the design of the exchange format and the data management. One aspect, the use of identifiers, needs special attention as it influences the interpretation by others. DCAT-AP is used mostly in a harvesting network: where one catalogue harvests from other. By this process, the DCAT-AP metadata descriptions spread through the harvesting network. In contrast to "classical" data exchange patterns where by default identifiers are locally scoped to the exchange context, the RDF format implicitly assumes global, public accessible identifiers (URIs). Harvesting enforces the latter. More on this identifier challenge and possible solution approaches are documented in the the guidelines on identifier management in DCAT-AP.

    Validation

    To verify if the data exchange is (technically) conformant to HealtDCAT-AP, the exchanged data can be validated using the provided SHACL shapes. SHACL is a W3C Recommendation to express constraints on a RDF knowledge graph. The provided SHACL shapes allow to check whether an HealthDCAT-AP catalogue expressed in a RDF serialization is valid.

    More on the validation and the provided SHACL shapes can be found in section [[[#validation-of-healthdcat-ap]]].

    Validation of HealthDCAT-AP

    Validation of datasets against HealthDCAT-AP is implemented as a two-stage approach that recognises HealthDCAT-AP as an extension of DCAT-AP. First, each dataset is validated against the baseline DCAT-AP specification to ensure conformance by importing the DCAT-AP via owl:imports statements. Subsequently, HealthDCAT-AP–specific constraints are applied, encompassing the Base profile checks for all levels of validation , Base, vocabularies, recommended properties, ranges.

    the following SHACL templates are provided:

    The shapes constraint files provide for each class mentioned in HealthDCAT-AP, and having additional properties defined, a template with the corresponding constraints. Class membership constraints are not present in the shapes constraints files. These are collected in the range constraint file.

    In order to validate a health dataset additional data might be required to import into the validator, such as the controlled vocabularies. These have to be retrieved from the appropriate places. As support, the following files express the imports (not transitive) according to the SHACL specification, which can be loaded into the Interoperable Europe Testbed.

    HealthDCAT-AP Examples

    This section describes the use of Dataset Series via examples. All examples are fictitious and are created to facilitate the understanding of Dataset Series. This section complements the usage explained in section Dataset series in DCAT 3 [[vocab-dcat-3]], because it illustrates the decision making process when switching from publishing a single Dataset to a Dataset Series.

    This section illustrates the HealthDCAT-AP specification with RDF examples for key properties and their usage.

    Title (dct:title)
    Other Identifier (adms:identifier)
    Sample (adms:sample)
    Version Notes (adms:versionNotes)
    Applicable Legislation (dcatap:applicableLegislation)
    Contact Point (dcat:contactPoint)
    Dataset Distribution (dcat:distribution)
    Has Version (dcat:hasVersion)
    Keyword (dcat:keyword)
    Landing Page (dcat:landingPage)
    Qualified Relation (dcat:qualifiedRelation)
    Spatial Resolution (dcat:spatialResolutionInMeters)
    Temporal Resolution (dcat:temporalResolution)
    Theme (dcat:theme)
    Version (dcat:version)
    Access Rights (dct:accessRights)
    Frequency (dct:accrualPeriodicity)
    Alternative (dct:alternative)
    Conforms To (dct:conformsTo)
    Creator (dct:creator)
    Description (dct:description)
    Identifier (dct:identifier)
    Is Referenced By (dct:isReferencedBy)
    Release Date (dct:issued)
    Language (dct:language)
    Modification Date (dct:modified)
    Provenance (dct:provenance)
    Publisher (dct:publisher)
    Related Resource (dct:relation)
    Source (dct:source)
    Geographical Coverage (dct:spatial)
    Temporal Coverage (dct:temporal)
    Type (dct:type)
    Legal Basis (dpv:hasLegalBasis)
    Personal Data (dpv:hasPersonalData)
    Purpose (dpv:hasPurpose)
    Quality Annotation (dqv:hasQualityAnnotation)
    Documentation (foaf:page)
    Analytics (healthdcatap:analytics)
    Code Values (healthdcatap:hasCodeValues)
    Coding System (healthdcatap:hasCodingSystem)
    Health Data Access Body (healthdcatap:hdab)
    Health Category (healthdcatap:healthCategory)
    Health Theme (healthdcatap:healthTheme)
    Maximum Typical Age (healthdcatap:maxTypicalAge)
    Minimum Typical Age (healthdcatap:minTypicalAge)
    Number Of Records (healthdcatap:numberOfRecords)
    Number Of Unique Individuals (healthdcatap:numberOfUniqueIndividuals)
    Population Coverage (healthdcatap:populationCoverage)
    Retention Period (healthdcatap:retentionPeriod)
    Qualified Attribution (prov:qualifiedAttribution)
    Was Generated By (prov:wasGeneratedBy)

    Deprecated properties and classes

    The following URIs used in DCAT-AP release 2.x for properties have been deprecated in DCAT-AP 3.0 [[vocab-dcat-3]] in favor for the URIs within the DCAT namespace.

    Acknowledgments

    The current version of HealthDCAT-AP was prepared by the European Commission. It builds upon the work developed by Sciensano [BE] - Pascal Derycke and the Norwegian Directorate of Health – Helsedirektoratet [NO] - Truls Korsgaard - under the EHDS2 Pilot project (EU4H-2021-PJ2 Project 101079839).

    We gratefully acknowledge the contributions made to this document by all members of the EHDS2 Pilot working group & SEMIC.