Skip to content
Snippets Groups Projects
Commit 4b17a172 authored by Alice Brenon's avatar Alice Brenon
Browse files

Reworking the introduction, added the abstract

parent 307ffd79
No related branches found
No related tags found
No related merge requests found
......@@ -12,21 +12,64 @@ header-includes:
}
---
# Abstract {-}
As witnesses to scientific progress, dictionaries and encyclopedias draw much
interest from digital humanities (Roe & al. (2016), Williams (2017), Vigier &
al. (2019), …), which accounts for the number of projects making them available
to the public (such as ARTFL[1], ENCCRE[2], COLLEX-LGE[3] and NENUFAR[4]) or
studying them in diachrony (such as BASNUM[5] and GEODE[6])
The volume of data involved issues a technical challenge to the digitizing
process required for the study of historical dictionaries. XML-TEI, a major
standard, includes a specialized module for dictionaries but it has shown some
limitations when used to encode «La Grande Encyclopédie», a late-19th century
encyclopedia (Jacquet-Pfau (2015)) from an OCRized version in XML-ALTO.
We describe those limitations and identify the fundamental differences that
prevent encoding encyclopedias with the XML-TEI module for dictionaries. We then
propose alternative encodings for encyclopedias while discussing their
advantages and drawbacks, including a fully XML-TEI-compliant scheme suitable
for automated processes.
# Dictionaries and encyclopedias
After emerging from dictionaries during the 18\textsuperscript{th} century,
encyclopedias became a fertile subgenre in themselves and a rich subject of
study to digital humanities for their particular relation to knowledge and its
evolution.
CollEx-Persée project DISCO-LGE[^DISCOLGE] set out to study *La Grande
Encyclopédie, Inventaire raisonné des Sciences, des Lettres et des Arts par une
Société de savants et de gens de lettres* and was published between 1885 and
1902 by an organised team of over two hundred specialists divided into eleven
sections. Its aim was to digitise and make *La Grande Encyclopédie* available to
the scientific community as well as the general public. A previous version of
this encyclopedia was partially available on Gallica[^Gallica] but lacked in
quality and its text had not been fully extracted from the pictures with an
Optical Characters Recognition (OCR) system.
[^DISCOLGE]: [https://www.collexpersee.eu/projet/disco-lge/](https://www.collexpersee.eu/projet/disco-lge/)
[^Gallica]: [https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22](https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22)
The author would like to thank the CollEx-Persée group for supporting the
DISCO-LGE project and is also grateful to the ASLAN project (ANR-10-LABX-0081)
of the Université de Lyon, for its financial support within the French program
"Investments for the Future" operated by the National Research Agency (ANR).
## "Encyclopedia"
In common parlance, the terms "dictionaries" and "encyclopedias" are used as
near synonyms to refer to books compiling vast amounts of knowledge into lists
of definitions ordered alphabetically. Their similarity is even visible in the
way they are coordinated in the full title of the *Encyclopédie ou Dictionnaire
raisonné des sciences des arts et des métiers* published by Diderot and
d'Alembert between 1751 and 1772 and which is probably the most famous work of
the genre and a symbol of the Age of Enlightenment.
## "Encyclopedia"
If the word "encyclopedia" is nowadays part of our vocabulary, it was much more
unusual and in fact controversial when Diderot and d'Alembert decided to use it
in the title of their book.
the genre and a symbol of the Age of Enlightenment. If the word "encyclopedia"
is nowadays part of our vocabulary, it was much more unusual and in fact
controversial when Diderot and d'Alembert decided to use it in the title of
their book.
The definition given by Furetière in his *Dictionnaire Universel* in 1690 is
still close to its greek etymology: a "ring of all knowledges", from *κύκλος*,
......@@ -122,36 +165,17 @@ d'Alembert insists on the importance of brevity for a clear definition in the
consider encyclopedias superior to dictionaries but really as a new subgenre
departing from them in terms of purpose.
## La Grande Encyclopédie
After emerging from dictionaries during the 18\textsuperscript{th} century,
encyclopedias became a fertile subgenre in themselves which kept evolving over
the following centuries. One of offsprings of the *Encyclopédie* from the
19\textsuperscript{th} century is entitled *La Grande Encyclopédie, Inventaire
raisonné des Sciences, des Lettres et des Arts par une Société de savants et de
gens de lettres* and was published between 1885 and 1902 by an organised team of
over two hundred specialists divided into eleven sections. The aim of
CollEx-Persée project DISCO-LGE[^DISCOLGE] was to digitise and make *La Grande
Encyclopédie* available to the scientific community as well as the general
public. A previous version was partially available on
Gallica[^Gallica]
but lacked in quality and its text had not been fully extracted from the
pictures with an Optical Characters Recognition (OCR) system.
[^DISCOLGE]: [https://www.collexpersee.eu/projet/disco-lge/](https://www.collexpersee.eu/projet/disco-lge/)
[^Gallica]: [https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22](https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22)
# The *dictionaries* TEI module
Producing data useful to future other scientific projects cannot be achieved
unless it is *interoperable* and *reusable*. These are the two last key aspects
of the FAIR[^FAIR] principles (*findability*, *accessibility*,
*interoperability* and *reusability*) which we strive to follow as a guideline
for efficient and quality research. It entails using standard formats and a
standard for encoding historical texts in the context of digital humanities is
XML-TEI, collectively developped by the *Text Encoding Initiative* consortium.
It consists in a set of technical specifications under the form of XML schemas,
along with a range of tools to handle them and training resources.
Data produced in the context of a project such as DISCO-LGE cannot be useful to
future other scientific projects unless it is *interoperable* and *reusable*.
These are the two last key aspects of the FAIR[^FAIR] principles (*findability*,
*accessibility*, *interoperability* and *reusability*) which we strive to follow
as a guideline for efficient and quality research. It entails using standard
formats and a standard for encoding historical texts in the context of digital
humanities is XML-TEI, collectively developped by the *Text Encoding Initiative*
consortium. It consists in a set of technical specifications under the form of
XML schemas, along with a range of tools to handle them and training resources.
[^FAIR]: [https://www.go-fair.org/fair-principles/](https://www.go-fair.org/fair-principles/)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment