Semantic Web & Metadata

Resources and further reading

Semantic Web Fundamentals

Dublin Core

There are two Dublin Core schemas, both available by default in Tropy: the Dublin Core Metadata Element Set (DCES) and the Dublin Core Metadata Terms (DCTerms).

The Dublin Core Metadata Element Set is a set of 15 basic elements designed for general-purpose description of digital resources. Simple, universal, and widely supported.

  1. Title
  2. Creator
  3. Subject
  4. Description
  5. Publisher
  6. Contributor
  7. Date
  8. Type
  9. Format
  10. Identifier
  11. Source
  12. Language
  13. Relation
  14. Coverage (geographic or temporal)
  15. Rights

Usage: Designed to be simple and generic — suitable for libraries, archives, and museums. No complex customisation required. An ISO standard, compatible with semantic interoperability on the web.

Namespace: http://purl.org/dc/elements/1.1/

Dublin Core Metadata Terms is an extension of DCES. It includes the 15 base elements (in more formalised forms) plus a much wider set of additional terms, qualifiers, and more precise concepts.

DCTerms includes:

  • The 15 base elements (more formally defined)
  • Additional elements such as Audience, Provenance, AccrualMethod
  • Qualifiers for greater precision — e.g. Date can be qualified as dcterms:created, dcterms:modified, dcterms:issued
  • Typed values — dates as xsd:date, URIs for places, licenses, etc.

Key additions over DCES:

Property Use
dcterms:created Creation date (typed)
dcterms:modified Last modification date
dcterms:spatial Place as a URI (e.g. GeoNames)
dcterms:temporal Time period
dcterms:license License URI (e.g. Creative Commons)
dcterms:isPartOf Link to a parent collection
dcterms:provenance Custody history

Usage: Used in more complex or specialised contexts — advanced digital libraries, archives, semantic web projects, databases requiring rich structure. Expressed in RDF for web semantic compatibility.

Namespace: http://purl.org/dc/terms/

Key differences — DCES vs DCTerms:

DCES DCTerms
Number of elements 15 base elements 15 + extended terms & qualifiers
Complexity Simple, general More flexible and specific
Typed values No Yes (xsd:date, URIs…)
Qualifiers No Yes
Best for Simple resource description Rich, semantic, interoperable data
Note

In Tropy: both schemas are available by default. Use DCES for simple projects; switch to DCTerms when you want to link fields to URIs (GeoNames, Getty TGN/ULAN, controlled vocabularies) for full Linked Data interoperability.

Official references:


Europeana Data Model (EDM)

EDM is the metadata schema used by Europeana to describe digital cultural heritage objects according to Semantic Web standards.

Key features:

  • RDF-based — built on the W3C Resource Description Framework
  • Separation of entities — distinguishes between the cultural object, its digital representation, and the aggregation:
    • ProvidedCHO — the cultural heritage object itself (painting, manuscript…)
    • WebResource — the digital file (image, audio, video)
    • Aggregation — groups metadata from different sources
    • Agent — a person or organisation with a role
    • Place and TimeSpan — contextualise the object in space and time
  • Multi-representation — supports multiple digital versions of a single object
  • Aligned with other standards — Dublin Core, LIDO, EAD, OAI-PMH

EDM is the schema to study if you plan to contribute to or consume data from Europeana, or model your corpus along the same principles.

🔗 pro.europeana.eu/page/edm-documentation


EXIF — Exchangeable Image File Format

EXIF is a set of technical metadata properties embedded directly inside image and video files (JPEG, TIFF). Unlike Dublin Core, it does not aim to provide semantic description but rather a technical record of the conditions under which the image was captured.

EXIF data typically includes:

  • Camera model, lens, shutter speed, aperture, ISO
  • Date and time of capture
  • GPS coordinates (if enabled on the device)
  • Image dimensions (pixels), resolution, colour space
  • File size
Note

In Tropy: EXIF properties are used in photo-level templates (not item-level). Tropy automatically extracts the following from each imported file:

  • Filename
  • Date created
  • Image dimensions in pixels
  • File size in bytes

You can add EXIF properties to a custom photo template to surface additional technical data (GPS coordinates, camera model, etc.) directly in the description panel.

The Wikipedia article Exchangeable image file format provides a concrete example and a full list of the main EXIF properties.


CIDOC-CRM


Controlled Vocabularies

GeoNames

Pactols

Getty Vocabularies

Published as Linked Open Data by the Getty Research Institute, the Getty Vocabularies are the standard reference for art history, architecture, and cultural heritage documentation. All terms have stable URIs under http://vocab.getty.edu/.

AAT — Art & Architecture Thesaurus

~50,000 terms covering object types, styles, materials, techniques, and concepts used in art and architectural description.

http://vocab.getty.edu/aat/300263552  →  oil paintings
http://vocab.getty.edu/aat/300014109  →  watercolors
http://vocab.getty.edu/aat/300015012  →  manuscripts

🔗 vocab.getty.edu/aat/

TGN — Thesaurus of Geographic Names

Over 2 million place entries — historical and current, from cities to archaeological sites. Unlike GeoNames (which focuses on current places), TGN is especially strong on historical toponyms and ancient sites.

http://vocab.getty.edu/tgn/7011179  →  Salamanca, Spain
http://vocab.getty.edu/tgn/7008038  →  Paris, France
http://vocab.getty.edu/tgn/7001386  →  Constantinople / Istanbul

🔗 vocab.getty.edu/tgn/

ULAN — Union List of Artist Names

Biographical and bibliographic information on artists, architects, decorators, and other makers — with variant name forms across languages and periods.

http://vocab.getty.edu/ulan/500010570  →  Francisco de Goya
http://vocab.getty.edu/ulan/500115588  →  El Greco
http://vocab.getty.edu/ulan/500021099  →  Rembrandt van Rijn

🔗 vocab.getty.edu/ulan/

Linked Data access

All Getty Vocabularies are queryable via SPARQL and available in multiple RDF serialisations:

Tip

In Tropy: import the Getty AAT vocabulary (http://vocab.getty.edu/aat.nt) into Preferences → Vocabularies to use AAT properties in your templates. Use ULAN URIs in dc:creator or dcterms:creator fields, and TGN URIs in dcterms:spatial (alongside or instead of GeoNames).

Loterre — Linked Open TERminology REsources

Published by Inist-CNRS, Loterre is a multidisciplinary platform hosting 70+ scientific terminologies as SKOS/RDF Linked Open Data — fully FAIR-compliant and freely accessible.

Unlike Openthéso (a tool for managing thesauri) or Pactols (a single domain-specific thesaurus), Loterre is a portal that aggregates terminologies from many disciplines and institutions, with a particular emphasis on the French research ecosystem.

Vocabularies relevant for SSH and CHORAL research:

Vocabulary Scope
Art et Archéologie Visual arts, monuments, archaeological objects
Ethnologie Peoples, cultures, practices
Histoire et sciences des religions Religious history and practices
Linguistique Linguistic concepts and terms
Littérature Literary forms, genres, movements
Géographie de l’Amérique du Nord North American place names
Pays et subdivisions Countries and administrative divisions

Access:

  • Browse & search: loterre.fr
  • SPARQL endpoint and REST API available for all vocabularies
  • Download in SKOS/RDF-XML, Turtle, JSON-LD, or CSV
  • Recently integrated into the ISTEX infrastructure for text and data mining
Tip

Loterre also offers tooling services for vocabulary producers: validate a SKOS file, transform from other formats, or align your vocabulary with an existing Loterre resource. Useful if you want to publish your own controlled vocabulary as Linked Open Data.

🔗 loterre.fr · loterre.istex.fr

Huma-Num Openthéso


JSON-LD