Skip to content
Snippets Groups Projects
ICHLL_Brenon.md 60.52 KiB
title: Encoding the Specificities of Encyclopedias
author: Alice [Brenon]{.smallcaps} ^1,2^
institute:
	- ICAR, CNRS, UMR5191, 69342
	- Univ Lyon, INSA Lyon, CNRS, UCBL, LIRIS, UMR5205, F-69621
numbersections: True
documentclass: article
classoption:
	- english
	- a4paper
	- 12pt
mainfont: "Libertinus Serif"
header-includes:
	- \usepackage{textalpha}
	- \usepackage{hyperref}
	- \usepackage{geometry}
	- \geometry{margin=25.4mm}
	- \hypersetup{
	        colorlinks,
	        linkcolor = blue,
	        urlcolor = blue
	    }

\begin{center} {\small \textsuperscript{1} ICAR, CNRS, UMR5191, 69342}\ {\small \textsuperscript{2} Univ Lyon, INSA Lyon, CNRS, UCBL, LIRIS, UMR5205, F-69621}\ \end{center}

Abstract This chapter illustrates the fundamental differences between dictionaries and encyclopedias by documenting the process of devising an encoding scheme and applying it to a late-19^th^ century encyclopedia, "La Grande Encyclopédie" (hence LGE). The effort, made in the context of project DISCO-LGE, consisted in working from an OCRised version of the pages in XML-ALTO to produce a fully XML-TEI-compliant encoding of the individual articles. Although the TEI guidelines include a specialised module for dictionaries which was identified as a promising tool for the task, systematic traversal of the schema using graph search methods revealed some limitations when used to encode this text. These shortcomings are reviewed and illustrated on a series of examples. An alternative encoding remaining within the core module of TEI is then proposed and demonstrated on articles from LGE containing key features. Finally, different strategies followed by other projects are discussed.

Keywords digital humanities, XML-TEI, dictionaries, encyclopedias

Introduction

Although both terms have been used rather interchangeably over the past few centuries, a dichotomy is now commonly being made between dictionaries and encyclopedias. A simple opposition can easily justify this distinction: dictionaries define words and tell one how to use them while encyclopedia usually go into longer development to give a more comprehensive and scientific understanding of the concept being defined. This common intuition links back to the entry written in the Encyclopédie ou Dictionnaire raisonné des sciences des arts et des métiers (hence EDdA) by @dalembert_dictionnaire_2022 [article DICTIONNAIRE, volume 4] who opposes three kinds of dictionaries: one to define words, the second to define facts and the last one to define things, corresponding respectively to language, history, and science and arts dictionaries. The first type corresponds to modern dictionaries while the two others are similar to what one expects to find in an encyclopedia.

However, d'Alembert himself doesn't think of these boundaries as very strict and he hints at the extreme difficulty in merely defining words without going into semantics and philosophical considerations:

un dictionnaire de langues, qui paroît n'être qu'un dictionnaire de mots, doit être souvent un dictionnaire de choses quand il est bien fait

("a language dictionary, which appears to be only a word dictionary, must often be a thing dictionary when it is made properly"). A similar criticism is made by @haiman_dictionaries_1980 [p. 331] who attacks no less than six criteria on which dictionaries and encyclopedias are generally opposed to reach the conclusion that there is no distinction between them because "dictionaries are encyclopedias". Regardless of the validity of his reasoning, it only proves one inclusion: that perhaps, dictionaries would be a special case of encyclopedias. This, as will be shown, does by no means imply that conversely encyclopedias are dictionaries.

XML-TEI is a set of guidelines, tools and tranining resources collectively developped by the @tei_consortium_tei_2023 to represent text in a highly structured and machine-readable format. Its toolbox has a modular structure consisting of optional parts each covering specific needs such as the physical features of a source document, the transcription of oral corpora or particular requirements for textual domains like poetry, or, in the case at hand, dictionaries. The intrinsic complexity of dictionaries has been well identified since the inception of the project [@tei_vault] and @ide_encoding_1995 underline the amount of work which went into the third version of the guidelines (P3) to provide a toolbox both general and expressive enough to account for the variety of conventions found in dictionaries. This module has been successfully used to encode both historical [@williams2017; @bohbot2018] and digitally native dictionaries [@bowers_bridging_2018]. In addition, a specific guidelines tailored at encoding dictionaries named TEI-Lex0 has also been published [@banski_tei_lex0_2017].

The TEI effort is described by @ide_background_1998 as "first steps" to reach a standard to encode corpora and lay a common basis for corpora comparison and reuse. They point some light inconsistencies in the design, remark that there is generally more than one way to encode a given text in XML-TEI and identify nine criteria to design a sound standard. Their claims are backed by concrete examples of encoding situations but give no idea of the prevalence of the issues reported. In fact, the sheer complexity of the guidelines can make it hard to ascertain whether a particular element structure is impossible to represent (not finding a suitable encoding is not a proof that there is none). This chapter will use results from graph theory to make a systematic study of the possibilities and shortcomings of the TEI dictionaries module, hence providing an additional proof that encyclopedias are not dictionaries and that the inclusion claimed by Haiman is a strict one.

Context of the study

To give a better understanding of this research, this section describes the aims of the project from which it stems before giving a short history of the term encyclopedia and underlining the known differences between dictionaries and encyclopedias which constitute the starting point of this investigation.

CollEx-Persée Project DISCO-LGE

The project (https://www.collexpersee.eu/projet/disco-lge/) set out to study La Grande Encyclopédie, Inventaire raisonné des Sciences, des Lettres et des Arts par une Société de savants et de gens de lettres (hence LGE), an encyclopedia published in France between 1885 and 1902 by an organised team of over two hundred specialists divided into eleven sections. This text comprises 31 tomes of about 1200 pages each and according to @jacquet-pfau2015 [pp. 88 et seq.] was the last major french encyclopedic endeavour directly inheriting from the prestigious ancestor that was the EDdA published by Diderot and d'Alembert 130 years earlier, between 1751 and 1772.

The aim of the project was to digitise and make LGE available to the scientific community as well as the general public. A previous version of this encyclopedia was partially available on Gallica (https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&collapsing=disabled&query=dc.relation%20all%20%22cb377013071%22) but lacked in quality and its text had not been fully extracted from the pictures with an Optical Characters Recognition (OCR) system. This prevented an exhaustive study of the text with textometry tools such as TXM [@heiden2010]. As a prelude to project GEODE (https://geode-project.github.io/), the goal of DISCO-LGE was to produce a digital version of LGE with a quality comparable to the one of l'EDdA provided by the ARTFL (http://artfl-project.uchicago.edu/) project in order to conduct a diachronic study of both encyclopedias.

Encyclopedia

If the word "encyclopedia" is now part of everyday vocabulary and has a slightly different meaning from dictionary, it was much more unusual and in fact controversial when Diderot and d'Alembert decided to use it in the title of their book, while having to coordinate them both in the full title of the EDdA which is probably the most famous work of the genre and a symbol of the Age of Enlightenment.

The definition given by Furetière in his Dictionnaire Universel in 1690 is still close to its greek etymology: a "ring of all knowledges", from κύκλος, "circle", and παιδεία, "knowledge". This meaning is the one used for instance by Rabelais in Pantagruel, when he has Thaumaste declare that Panurge opened to him "le vray puys et abisme de Encyclopedie" ("the true well and abyss of Encyclopedia"). At the time the word still mostly refers to the abstract concept of mastering all knowledges at once. Furetière adds that it's a quality one is unlikely to possess, and even seems to condemn its pursuit as a form of hubris: "C'est une témérité à un homme de vouloir posséder l'Encyclopédie" ("it is a recklessness for a man to want to possess Encyclopedia").

Beyond this moral reproach, the concept that pleased Rabelais was somewhat dated at the end of the 17^th^ century and attacked in the Dictionnaire Universel François et Latin, commonly refered to as the Dictionnaire de Trevoux, as utterly "burlesque" ("parodic"). The entry for "Encyclopédie" remained unchanged in the four editions issued between 1721 and 1752, mocking the use of the word and discouraging his readers to pursue it. In that intent, he quotes a poem from Pibrac encouraging people to specialise in only one discipline lest they should not reach perfection, based on an argumentation that resembles the saying "Jack of all trades, master of none". It is all the more interesting that the definition remains unaltered until 1752, one year after the publication of the first volume of the EDdA. The Jesuites who edited Dictionnaire de Trevoux frowned upon the project of the EDdA which they managed to get banned the same year by the Council of State on the charge of attempting to destroy the royal authority, inspiring rebellion and corrupting morality in general. There is much more at stake than words here, but the attempt to deprecate the word itself is part of their fight against the philosophers of the Enlightenment.