Organizing and Curating Image Corpora

with Tropy and Arvest

Julien Rabaud

SCD & Pôle Numérique - UPPA

March 10, 2026

– Agenda

Part 1 — Conceptual Foundations

The Semantic Web
Triples & URIs · RDF
Metadata schemas
Controlled vocabularies

Part 2 — IIIF

What is IIIF?
IIIF in the wild

Part 3 — Tools

Tropy
Arvest

Part 1
– Conceptual Foundations

– Two Webs

The Web we know

Pages written for humans

Search engines index words

Links connect documents

The Semantic Web

Data readable by machines

Links connect things

A global knowledge graph

Note

Tim Berners-Lee, 2001 — the goal: make the Web a universal medium for data exchange, not just document retrieval.

– Triples

Everything in the Semantic Web is a triple:

Subject — Predicate — Object

<Photo_001>   dc:creator             "Julien Rabaud"
<Photo_001>   dcterms:spatial        <geonames:2988507>
<Photo_001>   cidoc:P138_represents  <wd:Q5783414>

Tip

Each resource can be both subject and object — triples form an interconnected graph of knowledge.

– URIs: Naming Things

A URI is a unique, global, persistent identifier for any thing.

https://sws.geonames.org/2988507/          →  Paris
https://www.wikidata.org/entity/Q937       →  Marie Curie
http://vocab.getty.edu/page/aat/300015646  →  photographs

Unlike a URL (which locates a page), a URI names a resource — in any language, forever.

Tip

When you put a GeoNames URI in a Tropy metadata field, you’re not typing a string — you’re linking your data to a global knowledge graph.

– RDF: The Standard for Triples

Turtle — human-readable:

<https://myarchive.org/photo/001>
  dc:title   "Cloister of S. Domingo" ;
  dc:creator "Julien Rabaud" ;
  dc:date    "2024-09-15" .

JSON-LD — used by Tropy:

{
  "@id": "https://myarchive.org/photo/001",
  "dc:title": "Cloister of S. Domingo",
  "dc:creator": "Julien Rabaud"
}

Note

Tropy stores all metadata as JSON-LD — your descriptions are already Linked Data, even if you don’t think of them that way.

Metadata Schemas

How do we describe what we see?

Dublin Core

15 universal elements — usable for any resource type.

Property	Usage
`dc:title`	Title
`dc:creator`	Photographer / author
`dc:contributor`	Other contributors
`dc:publisher`	Publisher / institution
`dc:date`	Date of creation
`dc:subject`	Topic / theme
`dc:description`	Free-text note
`dc:type`	Resource type

Property	Usage
`dc:format`	JPEG, TIFF…
`dc:identifier`	Shelfmark / ID
`dc:source`	Archive, repository
`dc:language`	Language of visible text
`dc:coverage`	Place or period
`dc:relation`	Related items
`dc:rights`	License / copyright

🔗 dublincore.org · http://purl.org/dc/elements/1.1/

– Dublin Core Terms (DCTERMS)

Richer, typed versions of DC — plus extra properties.

# Basic DC — just a string
dc:date "2024-09-15"

# DCTERMS — typed and linkable
dcterms:created   "2024-09-15"^^xsd:date
dcterms:spatial   <geonames:3117735>
dcterms:license   <https://creativecommons.org/licenses/by/4.0/>
dcterms:isPartOf  <myArchive:collection_42>

Key additions:

dcterms:spatial — place URI
dcterms:temporal — time period
dcterms:license — explicit URI
dcterms:isPartOf — collection link

Tip

Rule of thumb: use dc: when a plain string is enough — use dcterms: whenever you want to link to a URI (a place, a license, a collection). DCTerms is what makes your data truly interoperable.

– CIDOC-CRM

The international standard for cultural heritage data (Europeana, museums, major archives).

Unlike Dublin Core, CIDOC-CRM models events — not just documents.

Note

Key insight: A photograph wasn’t just taken —
it was taken by someone, somewhere, at a moment, of something.

Everything important in CIDOC-CRM happens.

– CIDOC-CRM — Core Classes

Class	Meaning
`E22`	Human-Made Object
`E31`	Document / Photograph
`E39`	Actor (person or group)

Class	Meaning
`E52`	Time-Span
`E53`	Place
`E65`	Creation event

<Creation_001>  cidoc:P14_carried_out_by  <orcid:0000-…>
<Creation_001>  cidoc:P7_took_place_at    <geonames:3117735>
<Creation_001>  cidoc:P4_has_time-span    <2024-09-15>

Tip

In Tropy, you can import the CIDOC-CRM vocabulary and build event-oriented templates from these properties.

Which Schema for Your Template?

Your sources	Recommended schema	Level
General archival documents	Dublin Core Terms	Item
Correspondence, diaries	Tropy Correspondence (DC)	Item
Cultural heritage objects	CIDOC-CRM	Item
Visual works (art history)	VRA Core	Item
Technical image data	EXIF	Photo
Crop / detail of interest	DC or custom	Selection

Note

These schemas are not mutually exclusive — a single template can mix properties from several vocabularies.
Example: dc:title + dcterms:spatial + cidoc:P138_represents in the same item template.

Controlled Vocabularies

Consistent, linked values for metadata fields

– The Problem with Free Text

Typing “Paris” yourself:

“Paris”
“paris”
“Paris, France”
“Paris (France)”

→ 4 different strings, no shared meaning

Using a URI:

dcterms:spatial
  <geonames:2988507>

→ Always Paris
→ In any language
→ With coordinates
→ Linked to all other data about Paris

Note

Controlled vocabularies = shared dictionaries.
When everyone uses the same URI, datasets become interoperable.

– GeoNames

Over 12 million geographic names — all with stable URIs.

https://sws.geonames.org/2988507/  →  Paris, France
https://sws.geonames.org/3117735/  →  Salamanca, Spain
https://sws.geonames.org/6440564/  →  Anglet, France

Each entry includes names in 20+ languages, coordinates, administrative hierarchy, and feature type.

Tip

In Tropy: use GeoNames URIs in the dcterms:spatial field.
🔗 geonames.org/search.html

– Pactols

Multilingual thesaurus for Archaeology, Classical and Oriental Studies — maintained by the Frantiq network.

Covers: archaeological periods · object types · materials · ancient places · historical figures

Note

Especially relevant for CHORAL research — Romance cultures, Mediterranean and Iberian heritage, classical antiquity.

🔗 pactols.frantiq.fr

– Getty Vocabularies

Published as Linked Open Data by the Getty Research Institute — the standard reference for art history and cultural heritage.

Vocabulary	Scope
AAT	Art & Architecture Thesaurus — styles, materials, techniques, object types
TGN	Thesaurus of Geographic Names — historical & current places
ULAN	Union List of Artist Names — artists, architects, makers

vocab.getty.edu/aat/300263552
  →  oil paintings

vocab.getty.edu/tgn/7011179
  →  Salamanca

vocab.getty.edu/ulan/500010570
  →  Francisco Goya

🔗 vocab.getty.edu

– Loterre

Linked Open TERminology REsources — published by Inist-CNRS.

A multidisciplinary platform hosting 70+ scientific terminologies as SKOS/RDF Linked Open Data.

Relevant for SSH researchers:

Art et Archéologie
Ethnologie
Histoire et sciences des religions
Linguistique · Littérature
Géographie de l’Amérique du Nord
Pays et subdivisions

All vocabularies are:

Free to consult & download
Available as SKOS/RDF, JSON-LD, CSV
Queryable via SPARQL & REST API
FAIR-compliant

🔗 loterre.fr

– Openthéso (Huma-Num instance)

A platform hosting many discipline-specific thesauri — terms expressed in SKOS.

Each term has:

a stable ARK identifier (URI)
skos:prefLabel — preferred label
skos:altLabel — synonyms
skos:broader / skos:narrower
skos:exactMatch → links to other thesauri

Hosted thesauri include:

Architectural heritage (MHFA)
Performing arts vocabulary
… and many others

🔗 opentheso.huma-num.fr

Part 2 – IIIF

International Image Interoperability Framework

– What is IIIF?

Open standards for rich access to digitized images — developed since 2011 by libraries, archives, and museums worldwide.

Note

The core promise:
Any IIIF-compliant viewer can display any IIIF-compliant image — regardless of where it is hosted.

Deep zoom · cropping · structured collections · shared annotations · multi-institutional exhibitions

– Image API

Standardizes how images are served.

{server}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

# Full image at 800px wide:
https://example.org/iiif/photo001/full/800,/0/default.jpg

# Just the top-left quarter:
https://example.org/iiif/photo001/pct:0,0,50,50/full/0/default.jpg

→ Deep zoom, cropping, resizing — all server-side, no file duplication.

– Presentation API: The Manifest

Standardizes how collections are described.

A Manifest is a JSON-LD document that groups images into a structured object.

Manifest
  └── Canvas  (= one "view" or "page")
       ├── Annotation  [painting]   →  the image
       └── Annotation  [commenting] →  your notes

Tip

A manuscript → 1 Manifest, 200 Canvases.
A photo series → 1 Manifest, 50 Canvases.
An altarpiece → 1 Manifest, front + back + details.

IIIF — Europeana

3,000+ institutions aggregated — IIIF access to millions of items.

https://iiif.europeana.eu/presentation/{collection}/{id}/manifest

Note

For CHORAL researchers:
Search for heritage from Spain, Portugal, France, Italy, Romania — manuscripts, photographs, maps, objects — then import the Manifest directly into Arvest.

🔗 europeana.eu

IIIF — Omeka S

Omeka S can auto-generates IIIF Manifests for every item with an image.

Omeka S can import IIIF Manifests and create items.

IIIF — Nakala

Nakala (Huma-Num) is a research data repository that auto-generates IIIF endpoints for all deposited images.

Deposit your images → get a DOI + IIIF URL
Import the Manifest in Arvest
Annotate with collaborators
Export for publication or ML

Note

This is a complete, FAIR-compliant workflow for image corpora in the humanities.

🔗 nakala.fr

Part 3
– Tools for Working with Image Corpora

Tropy

From archival photos to structured, linked research data

– What is Tropy?

A free, open-source desktop app for organizing and describing research photographs.

Not a generic photo manager — built specifically for historians, art historians, and humanists.

Created by:

RRCHNM — Roy Rosenzweig Center for History and New Media (George Mason U.)
C²DH — Luxembourg Centre for Contemporary and Digital History

First release: 2017 · License: AGPL-3.0

🔗 tropy.org
📚 docs.tropy.org
💬 community.tropy.org

– Tropy is NOT…

A photo editor (Photoshop, Lightroom…)
A reference manager (that’s Zotero)
A writing platform
An online publishing platform (that’s Omeka)

✅ The missing link between your camera roll and your structured research data.

– The Core Workflow

Import photos from your archival sessions — including PDFs
Group related photos into items
Describe with rich, linked metadata
Annotate regions of interest
Export as JSON-LD, CSV, or to Omeka S

Tip

One Tropy project per research trip or archive collection.

– The Tropy Interface

4 main panels:

Project (left) — lists, tags, saved searches
Item grid (center) — browse all items
Metadata panel (right) — describe the selected item
Viewer — view and annotate photos

Two project modes:

Mode	Behaviour
Standard	Copies files → portable
Advanced	Links to originals → lighter

For archival work: Standard is safer.

Tip

Before your first import, set a default template in
Edit → Preferences → Settings — all imported items will use it automatically.

– Three Levels of Description

📁 Item — the primary unit

Groups logically related photos. Example: a document photographed recto/verso = 1 item, 2 photos.

– Three Levels of Description

🖼️ Photo — one image file

Metadata: filename, dimensions, date taken.

You rarely describe photos individually — the item is the primary unit of description.

– EXIF: Metadata Already in the File

Some metadata is embedded in the image file itself by the camera or scanner — no description needed.

Tropy extracts automatically:

Filename
Date & time of capture
Dimensions (pixels)
File size (bytes)

Available if GPS was on:

Latitude / longitude

The key distinction:

	Who writes it?	What kind?
EXIF	The device	Technical
DC / CIDOC	The researcher	Semantic

EXIF describes how the photo was made.
DC describes what it shows.

Tip

In Tropy, EXIF properties belong in a photo-level template — not an item template. This lets you surface technical data (camera model, GPS, resolution) alongside your semantic description.

– Three Levels of Description

🔍 Selection — a cropped region

A seal, a signature, an inscription, a motif.

Has its own title, notes, and tags — linked to pixel coordinates in the image.

– Templates

A template defines which metadata fields appear for an item — each field is a property from a vocabulary.

Built-in:

Tropy Generic — Dublin Core
Tropy Correspondence — letters
Tropy Photo — photo-level metadata

Also available (import):

CIDOC-CRM · VRA Core · Schema.org

Custom / imported:

Edit → Preferences → Vocabularies → Import

Any RDF/OWL schema (JSON-LD or Turtle) → then use those properties in your templates.

– Plugins

Import

CSV Import
Omeka S Import
IIIF Import

Export

Omeka S Export
CSV Export
CSL / Zotero Export

🔗 tropy.org

Experimental: tropy-plugin-nakala by Bruno Morandière

– Learning Tropy

English:

📚 docs.tropy.org
▶️ vimeo.com/user104478141
▶️ youtube.com/@tropy
💬 forums.tropy.org

– Hands-On: Tropy

We will explore together:

Create a new project and import a folder of photos
Create an item from multiple photos (recto / verso)
Apply a template and fill in metadata
Create a selection on a detail
Use tags and saved searches
Export as JSON-LD

Beyond Description — Visualizing Your Corpus

Once your images are described and structured, new possibilities open up.

VIKUS Viewer — developed at FH Potsdam’s Urban Complexity Lab — arranges thousands of cultural artifacts on a dynamic canvas, letting you explore thematic and temporal patterns across an entire collection at a glance.

Items positioned along a timeline
Keywords visualized as an interactive frequency map
Zoom into high-resolution textures
Runs in the browser — no installation

Note

I won’t go into this further today — but it’s a beautiful example of what a well-described corpus enables.

Once you’ve done the work in Tropy, tools like this become possible.

🔗 vikusviewer.fh-potsdam.de

Arvest

Annotate, collaborate, expose IIIF collections

– What is Arvest?

A web platform for working with IIIF image collections — no installation needed.

Import local images or IIIF Manifests
Annotate with the W3C Web Annotation standard
Collaborate with shared workspaces
Expose data via a REST API
Export for machine learning pipelines

🔗 arvest.app

– Importing — Local Files

Create a workspace
Import → Upload files
Arvest generates a IIIF Manifest automatically

Supported formats: JPEG, PNG, TIFF, WebP

– Importing — IIIF Manifest

Find a Manifest URL (Europeana, Nakala, Omeka S, Gallica…)
Import → IIIF Manifest URL
Paste → all canvases and metadata are imported

https://gallica.bnf.fr/iiif/ark:/12148/btv1b8452439r/manifest.json

– Annotating

Arvest uses the W3C Web Annotation standard.

Annotation types:

Region — box or polygon
Point — a specific location
Full-canvas — note on the whole image

Each annotation has:

a body (text, tag, or URI)
a motivation
creator + date metadata

– Uses in Humanities Research

Identify depicted persons, places, objects
Transcribe visible text or inscriptions
Tag iconographic themes
Link regions to controlled vocabulary URIs
Compare motifs across a corpus

– Collaborating

Shared workspaces:

Invite collaborators by email
Roles: viewer · annotator · editor · admin
All annotations visible to the team
Comment threads · activity log

Tip

For CHORAL:
Invite partners across institutions, annotate the same corpus collaboratively — across national borders.

– Annotation Workflow

Lead researcher imports corpus + sets guidelines
Collaborators annotate independently
Review — conflicts flagged
Consensus — annotations finalized
Export for publication or ML

– Exposing Data via API

GET /api/v1/workspaces/{id}/manifests
GET /api/v1/workspaces/{id}/annotations
GET /api/v1/manifests/{id}/canvas/{n}

→ Your annotated corpus becomes a queryable dataset

Export formats for Machine Learning:

COCO — object detection
CSV / JSON — text classification
IIIF + W3C — ML pipelines

Note

Annotate once → reuse everywhere.

– The Full Workflow

flowchart LR
    A["📷 Archival<br/>Photos"] --> B["Tropy<br/>Organize & Describe"]
    B -- "JSON-LD" --> C["Omeka S<br/>or Nakala"]
    C -- "IIIF Manifest" --> D["Arvest<br/>Annotate & Collaborate"]
    D -- "API / Export" --> E["Publication<br/>or ML"]
    F["Europeana · Gallica<br/>Other IIIF sources"] --> D
    G["GeoNames · Pactols<br/>Openthéso"] --> B & D

Summary

Key principles:

Use URIs instead of free text
Choose a metadata schema suited to your sources
Link to controlled vocabularies
IIIF is the interoperability layer for images

Your next steps:

Install Tropy
Register on Arvest
Find IIIF content on Europeana
Deposit images in Nakala
Create a collaborative workspace

*

Questions?

Julien Rabaud · UPPA · SCD · Pôle Numérique

📧 julien.rabaud@univ-pau.fr

ujubib.github.io/ed-tropy-arvest/slides.html

References

Semantic Web & RDF: w3.org/TR/rdf11-primer

Dublin Core: dublincore.org

CIDOC-CRM: cidoc-crm.org

GeoNames: geonames.org

Pactols / Openthéso: pactols.frantiq.fr · opentheso.huma-num.fr

IIIF: iiif.io · iiif.io/api/cookbook

Tropy: docs.tropy.org · forums.tropy.org

Nakala: nakala.fr

Arvest: arvest.app

VIKUS Viewer: vikusviewer.fh-potsdam.de

Organizing and Curating Image Corpora

– Agenda

Part 1– Conceptual Foundations

– Two Webs

The Web we know

The Semantic Web

– Triples

– URIs: Naming Things

– RDF: The Standard for Triples

Turtle — human-readable:

JSON-LD — used by Tropy:

Metadata Schemas

Dublin Core

– Dublin Core Terms (DCTERMS)

– CIDOC-CRM

– CIDOC-CRM — Core Classes

Which Schema for Your Template?

Controlled Vocabularies

– The Problem with Free Text

– GeoNames

– Pactols

– Getty Vocabularies

– Loterre

– Openthéso (Huma-Num instance)

Part 2 – IIIF

– What is IIIF?

– Image API

– Presentation API: The Manifest

IIIF — Europeana

IIIF — Omeka S

IIIF — Nakala

Part 3– Tools for Working with Image Corpora

Tropy

– What is Tropy?

– Tropy is NOT…

– The Core Workflow

– The Tropy Interface

– Three Levels of Description

– Three Levels of Description

– EXIF: Metadata Already in the File

– Three Levels of Description

– Templates

– Plugins

– Learning Tropy

– Hands-On: Tropy

Beyond Description — Visualizing Your Corpus

Arvest

– What is Arvest?

– Importing — Local Files

– Importing — IIIF Manifest

– Annotating

– Uses in Humanities Research

– Collaborating

– Annotation Workflow

– Exposing Data via API

– The Full Workflow

Summary

*

References

Part 1
– Conceptual Foundations

Part 3
– Tools for Working with Image Corpora