Ultimo aggiornamento
luglio 23,2017
Condizioni d'uso
Open use. Must provide the source.
Organizzazione
OpenGLAM CH Working Group

Descrizione

The Historical database of Le Temps comprises 3 newspapers over 200 years. The Journal de Genève ran from 1826 to 1998, the Gazette de Lausanne (under different names) from 1798 to 1998, and the Nouveau Quotidien from 1991 to 1998, before the merging to the present Le Temps. The full archive was digitized and OCRed (extraction of digital text) in 2008.

The project is of interest for the community because newspaper databases are becoming a widespread reality across Europe, calling for dedicated techniques to fully exploit them.

We share all the articles of JDG and DGL for the year 1914 in text (ORCed) form, via xml files. Articles come in 2 flavors: raw and annotated. Raw means for each month and issue, there is an xml file with the full text of an article, some metadata and a division in columns (roughly, the formatting units on the page). Annotated means we used three Named Entity Recognition (NER) and Disambiguation (NED) services for French, fed them with out data and merged the results in a unique annotation.

Risorse

Informazioni aggiuntive

Identificatore
dhlab-jdg-gdl-1914@openglam
Data di rilascio
luglio 23,2017
Data di modifica
luglio 23,2017
Editore
EPFL-CDH-DHI-DHLAB
Punti di contatto
EPFL-CDH-DHI-DHLAB
Lingue
Francese
Addizionali informazioni
-
Landing page
https://dhlab.epfl.ch/page-96354-en.html
Documentazione
Copertura temporale
-
Copertura spaziale
-
Intervallo di aggiornamento
Irregolare
Accesso ai metadati
API (JSON) Scarica XML

Avante domande?

Demandez directement à l'éditeur

EPFL-CDH-DHI-DHLAB