CLARIN ERIC

Common Language Resources and Technology Infrastructure

What is CLARIN?

CLARIN (Common Language Resources and Technology Infrastructure) is a European Research Infrastructure Consortium providing access to digital language data, NLP tools, and services for SSH researchers. It was established as an ERIC in 2012 and counts 22 full members.

Note

In France, CLARIN-FR is coordinated by Huma-Num, which provides the national node. The Huma-Num consortium CORLI (spoken language corpora) is France’s main contribution to CLARIN’s data infrastructure.

Scope and Relevance

CLARIN is primarily designed for language-based research, but its tools and corpora are useful across many humanities disciplines:

Research area	CLARIN tools and resources
Linguistics & NLP	Annotated corpora, lexicons, POS taggers
Literary studies	Stylometric analysis, historical corpora, TEI editions
History	Parliamentary corpora, newspaper archives, Named Entity Recognition
Sociology	Social media datasets, discourse analysis
Oral history	Spoken language corpora (via CORLI/Huma-Num)

Key Tools

Virtual Language Observatory (VLO) — search across thousands of language resources from CLARIN member centres.

Language Resource Switchboard — upload a text and get a list of NLP tools appropriate for your task (tokenisation, NER, sentiment analysis, etc.).

WebLicht — compose NLP service chains into reusable analysis workflows.

CLARIN Learning Hub — training modules and course materials for using language resources.

CLARIN, DARIAH, and EOSC

CLARIN and DARIAH are closely intertwined:

They co-develop the SSH Open Marketplace (with CESSDA)
They jointly participate in the European Open Science Cloud (EOSC)
They organised a joint Spring Conference in Riga (March 2026)
They collaborate in the ATRIUM project on shared services for humanities research

→ For the full picture, see How They Connect.

🔗 https://www.clarin.eu