From 4b17a172e621e457a10c45f84728893693f97256 Mon Sep 17 00:00:00 2001
From: Alice BRENON <alice.brenon@ens-lyon.fr>
Date: Mon, 28 Feb 2022 18:50:56 +0100
Subject: [PATCH] Reworking the introduction, added the abstract

---
 ICHLL_Brenon.md | 94 +++++++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 35 deletions(-)

diff --git a/ICHLL_Brenon.md b/ICHLL_Brenon.md
index c9f2e28..8b19eaf 100644
--- a/ICHLL_Brenon.md
+++ b/ICHLL_Brenon.md
@@ -12,21 +12,64 @@ header-includes:
 	}
 ---
 
+# Abstract {-}
+
+As witnesses to scientific progress, dictionaries and encyclopedias draw much
+interest from digital humanities (Roe & al. (2016), Williams (2017), Vigier &
+al. (2019), …), which accounts for the number of projects making them available
+to the public (such as ARTFL[1], ENCCRE[2], COLLEX-LGE[3] and NENUFAR[4]) or
+studying them in diachrony (such as BASNUM[5] and GEODE[6])
+
+The volume of data involved issues a technical challenge to the digitizing
+process required for the study of historical dictionaries. XML-TEI, a major
+standard, includes a specialized module for dictionaries but it has shown some
+limitations when used to encode «La Grande Encyclopédie», a late-19th century
+encyclopedia (Jacquet-Pfau (2015)) from an OCRized version in XML-ALTO.
+
+We describe those limitations and identify the fundamental differences that
+prevent encoding encyclopedias with the XML-TEI module for dictionaries. We then
+propose alternative encodings for encyclopedias while discussing their
+advantages and drawbacks, including a fully XML-TEI-compliant scheme suitable
+for automated processes.
+
+
 # Dictionaries and encyclopedias
 
+After emerging from dictionaries during the 18\textsuperscript{th} century,
+encyclopedias became a fertile subgenre in themselves and a rich subject of
+study to digital humanities for their particular relation to knowledge and its
+evolution.
+
+CollEx-Persée project DISCO-LGE[^DISCOLGE] set out to study *La Grande
+Encyclopédie, Inventaire raisonné des Sciences, des Lettres et des Arts par une
+Société de savants et de gens de lettres* and was published between 1885 and
+1902 by an organised team of over two hundred specialists divided into eleven
+sections. Its aim was to digitise and make *La Grande Encyclopédie* available to
+the scientific community as well as the general public. A previous version of
+this encyclopedia was partially available on Gallica[^Gallica] but lacked in
+quality and its text had not been fully extracted from the pictures with an
+Optical Characters Recognition (OCR) system.
+
+[^DISCOLGE]: [https://www.collexpersee.eu/projet/disco-lge/](https://www.collexpersee.eu/projet/disco-lge/)
+[^Gallica]: [https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22](https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22)
+
+The author would like to thank the CollEx-Persée group for supporting the
+DISCO-LGE project and is also grateful to the ASLAN project (ANR-10-LABX-0081)
+of the Université de Lyon, for its financial support within the French program
+"Investments for the Future" operated by the National Research Agency (ANR).
+
+## "Encyclopedia"
+
 In common parlance, the terms "dictionaries" and "encyclopedias" are used as
 near synonyms to refer to books compiling vast amounts of knowledge into lists
 of definitions ordered alphabetically. Their similarity is even visible in the
 way they are coordinated in the full title of the *Encyclopédie ou Dictionnaire
 raisonné des sciences des arts et des métiers* published by Diderot and
 d'Alembert between 1751 and 1772 and which is probably the most famous work of
-the genre and a symbol of the Age of Enlightenment.
-
-## "Encyclopedia"
-
-If the word "encyclopedia" is nowadays part of our vocabulary, it was much more
-unusual and in fact controversial when Diderot and d'Alembert decided to use it
-in the title of their book.
+the genre and a symbol of the Age of Enlightenment. If the word "encyclopedia"
+is nowadays part of our vocabulary, it was much more unusual and in fact
+controversial when Diderot and d'Alembert decided to use it in the title of
+their book.
 
 The definition given by Furetière in his *Dictionnaire Universel* in 1690 is
 still close to its greek etymology: a "ring of all knowledges", from *κύκλος*,
@@ -122,36 +165,17 @@ d'Alembert insists on the importance of brevity for a clear definition in the
 consider encyclopedias superior to dictionaries but really as a new subgenre
 departing from them in terms of purpose.
 
-## La Grande Encyclopédie
-
-After emerging from dictionaries during the 18\textsuperscript{th} century,
-encyclopedias became a fertile subgenre in themselves which kept evolving over
-the following centuries. One of offsprings of the *Encyclopédie* from the
-19\textsuperscript{th} century is entitled *La Grande Encyclopédie, Inventaire
-raisonné des Sciences, des Lettres et des Arts par une Société de savants et de
-gens de lettres* and was published between 1885 and 1902 by an organised team of
-over two hundred specialists divided into eleven sections. The aim of
-CollEx-Persée project DISCO-LGE[^DISCOLGE] was to digitise and make *La Grande
-Encyclopédie* available to the scientific community as well as the general
-public. A previous version was partially available on
-Gallica[^Gallica]
-but lacked in quality and its text had not been fully extracted from the
-pictures with an Optical Characters Recognition (OCR) system.
-
-[^DISCOLGE]: [https://www.collexpersee.eu/projet/disco-lge/](https://www.collexpersee.eu/projet/disco-lge/)
-[^Gallica]: [https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22](https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22)
-
 # The *dictionaries* TEI module
 
-Producing data useful to future other scientific projects cannot be achieved
-unless it is *interoperable* and *reusable*. These are the two last key aspects
-of the FAIR[^FAIR] principles (*findability*, *accessibility*,
-*interoperability* and *reusability*) which we strive to follow as a guideline
-for efficient and quality research. It entails using standard formats and a
-standard for encoding historical texts in the context of digital humanities is
-XML-TEI, collectively developped by the *Text Encoding Initiative* consortium.
-It consists in a set of technical specifications under the form of XML schemas,
-along with a range of tools to handle them and training resources.
+Data produced in the context of a project such as DISCO-LGE cannot be useful to
+future other scientific projects unless it is *interoperable* and *reusable*.
+These are the two last key aspects of the FAIR[^FAIR] principles (*findability*,
+*accessibility*, *interoperability* and *reusability*) which we strive to follow
+as a guideline for efficient and quality research. It entails using standard
+formats and a standard for encoding historical texts in the context of digital
+humanities is XML-TEI, collectively developped by the *Text Encoding Initiative*
+consortium.  It consists in a set of technical specifications under the form of
+XML schemas, along with a range of tools to handle them and training resources.
 
 [^FAIR]: [https://www.go-fair.org/fair-principles/](https://www.go-fair.org/fair-principles/)
 
-- 
GitLab