Semantic Web & Metadata
Resources and further reading
Semantic Web Fundamentals
Recommended Readings
- W3C RDF Primer — the official introduction to the Resource Description Framework: w3.org/TR/rdf11-primer
- Linked Data Patterns — a pattern catalogue for modelling, publishing, and consuming Linked Data: patterns.dataincubator.org
- Programming the Semantic Web (Segaran et al.) — practical introduction with code examples
Dublin Core
There are two Dublin Core schemas, both available by default in Tropy: the Dublin Core Metadata Element Set (DCES) and the Dublin Core Metadata Terms (DCTerms).
The Dublin Core Metadata Element Set is a set of 15 basic elements designed for general-purpose description of digital resources. Simple, universal, and widely supported.
- Title
- Creator
- Subject
- Description
- Publisher
- Contributor
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage (geographic or temporal)
- Rights
Usage: Designed to be simple and generic — suitable for libraries, archives, and museums. No complex customisation required. An ISO standard, compatible with semantic interoperability on the web.
Namespace: http://purl.org/dc/elements/1.1/
Dublin Core Metadata Terms is an extension of DCES. It includes the 15 base elements (in more formalised forms) plus a much wider set of additional terms, qualifiers, and more precise concepts.
DCTerms includes:
- The 15 base elements (more formally defined)
- Additional elements such as
Audience,Provenance,AccrualMethod… - Qualifiers for greater precision — e.g.
Datecan be qualified asdcterms:created,dcterms:modified,dcterms:issued - Typed values — dates as
xsd:date, URIs for places, licenses, etc.
Key additions over DCES:
| Property | Use |
|---|---|
dcterms:created |
Creation date (typed) |
dcterms:modified |
Last modification date |
dcterms:spatial |
Place as a URI (e.g. GeoNames) |
dcterms:temporal |
Time period |
dcterms:license |
License URI (e.g. Creative Commons) |
dcterms:isPartOf |
Link to a parent collection |
dcterms:provenance |
Custody history |
Usage: Used in more complex or specialised contexts — advanced digital libraries, archives, semantic web projects, databases requiring rich structure. Expressed in RDF for web semantic compatibility.
Namespace: http://purl.org/dc/terms/
Key differences — DCES vs DCTerms:
| DCES | DCTerms | |
|---|---|---|
| Number of elements | 15 base elements | 15 + extended terms & qualifiers |
| Complexity | Simple, general | More flexible and specific |
| Typed values | No | Yes (xsd:date, URIs…) |
| Qualifiers | No | Yes |
| Best for | Simple resource description | Rich, semantic, interoperable data |
In Tropy: both schemas are available by default. Use DCES for simple projects; switch to DCTerms when you want to link fields to URIs (GeoNames, Getty TGN/ULAN, controlled vocabularies) for full Linked Data interoperability.
Official references:
- DCES specification: dublincore.org/specifications/dublin-core/dces/
- DCTerms specification: dublincore.org/specifications/dublin-core/dcmi-terms/
- JSON-LD context: purl.org/dc/terms/
- JSON-LD Playground: json-ld.org/playground/
Europeana Data Model (EDM)
EDM is the metadata schema used by Europeana to describe digital cultural heritage objects according to Semantic Web standards.
Key features:
- RDF-based — built on the W3C Resource Description Framework
- Separation of entities — distinguishes between the cultural object, its digital representation, and the aggregation:
ProvidedCHO— the cultural heritage object itself (painting, manuscript…)WebResource— the digital file (image, audio, video)Aggregation— groups metadata from different sourcesAgent— a person or organisation with a rolePlaceandTimeSpan— contextualise the object in space and time
- Multi-representation — supports multiple digital versions of a single object
- Aligned with other standards — Dublin Core, LIDO, EAD, OAI-PMH
EDM is the schema to study if you plan to contribute to or consume data from Europeana, or model your corpus along the same principles.
🔗 pro.europeana.eu/page/edm-documentation
EXIF — Exchangeable Image File Format
EXIF is a set of technical metadata properties embedded directly inside image and video files (JPEG, TIFF). Unlike Dublin Core, it does not aim to provide semantic description but rather a technical record of the conditions under which the image was captured.
EXIF data typically includes:
- Camera model, lens, shutter speed, aperture, ISO
- Date and time of capture
- GPS coordinates (if enabled on the device)
- Image dimensions (pixels), resolution, colour space
- File size
In Tropy: EXIF properties are used in photo-level templates (not item-level). Tropy automatically extracts the following from each imported file:
- Filename
- Date created
- Image dimensions in pixels
- File size in bytes
You can add EXIF properties to a custom photo template to surface additional technical data (GPS coordinates, camera model, etc.) directly in the description panel.
The Wikipedia article Exchangeable image file format provides a concrete example and a full list of the main EXIF properties.
CIDOC-CRM
- Official site: cidoc-crm.org
- Current specification (v7.1.3): cidoc-crm.org/sites/default/files/cidoc_crm_version_7.1.3.pdf
- Linked Art (CIDOC-CRM profile for art): linked.art
- CIDOC-CRM Tutorial by George Bruseker: available on the CIDOC-CRM website
Controlled Vocabularies
GeoNames
- Search interface: geonames.org/search.html
- API documentation: geonames.org/export/web-services.html
Pactols
- Thesaurus browser: pactols.frantiq.fr
- Frantiq network: frantiq.fr
Getty Vocabularies
Published as Linked Open Data by the Getty Research Institute, the Getty Vocabularies are the standard reference for art history, architecture, and cultural heritage documentation. All terms have stable URIs under http://vocab.getty.edu/.
AAT — Art & Architecture Thesaurus
~50,000 terms covering object types, styles, materials, techniques, and concepts used in art and architectural description.
http://vocab.getty.edu/aat/300263552 → oil paintings
http://vocab.getty.edu/aat/300014109 → watercolors
http://vocab.getty.edu/aat/300015012 → manuscripts
TGN — Thesaurus of Geographic Names
Over 2 million place entries — historical and current, from cities to archaeological sites. Unlike GeoNames (which focuses on current places), TGN is especially strong on historical toponyms and ancient sites.
http://vocab.getty.edu/tgn/7011179 → Salamanca, Spain
http://vocab.getty.edu/tgn/7008038 → Paris, France
http://vocab.getty.edu/tgn/7001386 → Constantinople / Istanbul
ULAN — Union List of Artist Names
Biographical and bibliographic information on artists, architects, decorators, and other makers — with variant name forms across languages and periods.
http://vocab.getty.edu/ulan/500010570 → Francisco de Goya
http://vocab.getty.edu/ulan/500115588 → El Greco
http://vocab.getty.edu/ulan/500021099 → Rembrandt van Rijn
Linked Data access
All Getty Vocabularies are queryable via SPARQL and available in multiple RDF serialisations:
- SPARQL endpoint: vocab.getty.edu/sparql
- Search interface: vocab.getty.edu
- Documentation: getty.edu/research/tools/vocabularies/lod/
In Tropy: import the Getty AAT vocabulary (http://vocab.getty.edu/aat.nt) into Preferences → Vocabularies to use AAT properties in your templates. Use ULAN URIs in dc:creator or dcterms:creator fields, and TGN URIs in dcterms:spatial (alongside or instead of GeoNames).
Loterre — Linked Open TERminology REsources
Published by Inist-CNRS, Loterre is a multidisciplinary platform hosting 70+ scientific terminologies as SKOS/RDF Linked Open Data — fully FAIR-compliant and freely accessible.
Unlike Openthéso (a tool for managing thesauri) or Pactols (a single domain-specific thesaurus), Loterre is a portal that aggregates terminologies from many disciplines and institutions, with a particular emphasis on the French research ecosystem.
Vocabularies relevant for SSH and CHORAL research:
| Vocabulary | Scope |
|---|---|
| Art et Archéologie | Visual arts, monuments, archaeological objects |
| Ethnologie | Peoples, cultures, practices |
| Histoire et sciences des religions | Religious history and practices |
| Linguistique | Linguistic concepts and terms |
| Littérature | Literary forms, genres, movements |
| Géographie de l’Amérique du Nord | North American place names |
| Pays et subdivisions | Countries and administrative divisions |
Access:
- Browse & search: loterre.fr
- SPARQL endpoint and REST API available for all vocabularies
- Download in SKOS/RDF-XML, Turtle, JSON-LD, or CSV
- Recently integrated into the ISTEX infrastructure for text and data mining
Loterre also offers tooling services for vocabulary producers: validate a SKOS file, transform from other formats, or align your vocabulary with an existing Loterre resource. Useful if you want to publish your own controlled vocabulary as Linked Open Data.
Huma-Num Openthéso
- Platform: opentheso.huma-num.fr
- GitHub (open-source): github.com/MOM-CNRS/Opentheso
- SKOS documentation (W3C): w3.org/TR/skos-reference/
JSON-LD
- Official spec: json-ld.org
- JSON-LD Playground (test your JSON-LD): json-ld.org/playground/