diff --git a/ICHLL_Brenon.md b/ICHLL_Brenon.md index c091d5adffe708c1a6e85c0b54eccd24bc76eddb..0c574bd053345dfce0256c5c90d366e53e6ea82c 100644 --- a/ICHLL_Brenon.md +++ b/ICHLL_Brenon.md @@ -69,63 +69,56 @@ semantics and philosophical considerations: (*a language dictionary, which appears to be only a word dictionary, must often be a thing dictionary when it is made properly*). A similar criticism is made by -@haiman_dictionaries_1980 who attacks no less than six criteria on which -dictionaries and encyclopedias are generally opposed to reach the conclusion -that there is no distinction between them because "dictionaries *are* +@haiman_dictionaries_1980 [p. 331] who attacks no less than six criteria on +which dictionaries and encyclopedias are generally opposed to reach the +conclusion that there is no distinction between them because "dictionaries *are* encyclopedias". Regardless of the validity of his reasoning, it only proves one inclusion: that perhaps, dictionaries would be a special case of encyclopedias. -This, as will be evidenced, does by no means imply that encyclopedias are +This, as will be shown, does by no means imply that conversely encyclopedias are dictionaries. -XML-TEI is a set of guidelines collectively developped by the -@tei_consortium_tei_2023 under the form of XML schemas, along with a range of -tools to handle them and training resources in order to represent text in a -highly structured and machine-readable format. Its toolbox has a modular -structure consisting of optional parts each covering specific needs such as the -physical features of a source document, the transcription of oral corpora or -particular requirements for textual domains like poetry, or, in the case at -hand, dictionaries. - -After describing why the dedicated -module was a natural candidate to consider, I formalise tools from graph -theory to browse the specifications of this guideline in a rational way and -explore this module in detail. - -@romary_formal_2007 - -(@ide_encoding_1995 *dictionaries* only for western dictionaries) have been -applied for both historical (@bohbot2018) and digitally native -(@bowers_bridging_2018). In addition, a specific guidelines tailored at encoding -dictionaries, TEI-Lex0, has been published [@banski_tei_lex0_2017]. - -Systematic study of the guidelines @ide_background_1998 but here's a new method. - -Less than ten years after the beginings of the TEI, @ide_background_1998 gives a -thorough account of the criteria - - -# Dictionaries and encyclopedias - -After emerging over the course of the 18^th^ century, encyclopedias became a -fertile subgenre in themselves and a rich subject of study to digital humanities -for their particular relation to knowledge and its evolution. This section -describes the goal of the project, then looks at the origin of the term -"encyclopedia" itself before comparing the approaches of encyclopedias and -dictionaries. - -## Context of the project - -CollEx-Persée project DISCO-LGE +XML-TEI is a set of guidelines, tools and tranining resources collectively +developped by the @tei_consortium_tei_2023 to represent text in a highly +structured and machine-readable format. Its toolbox has a modular structure +consisting of optional parts each covering specific needs such as the physical +features of a source document, the transcription of oral corpora or particular +requirements for textual domains like poetry, or, in the case at hand, +dictionaries. The intrinsic complexity of dictionaries has been well identified +since the inception of the project [@tei_vault] and @ide_encoding_1995 +underlines the amount of work which went into the third version of the +guidelines (P3) to provide a toolbox both general and expressive enough to +account for the variety of conventions found in dictionaries. +@romary_formal_2007 This module has been successfully used to encode both +historical [@williams2017], [@bohbot2018] and digitally native dictionaries +[@bowers_bridging_2018]. In addition, a specific guidelines tailored at encoding +dictionaries named TEI-Lex0 has also been published [@banski_tei_lex0_2017]. + +The TEI effort is described as "first steps" by @ide_background_1998 to reach a +standard to encode corpora and lay a common basis for corpora comparisons and +reuse. They point some light inconsistencies in the design, remark that there is +generally more than one way to encode a given text in XML-TEI and identify nine +criteria to design a sound standard. Their claims are backed by concrete +examples of encoding situations but without giving any idea of the prevalence of +the issues found. In fact, the sheer complexity of the guidelines can make it +hard to ascertain whether a particular element structure is impossible to +represent (not finding a suitable encoding is not a proof that there is none). +This chapter will use results from graph theory to give a systematic study of +the possibilities and shortcomings of the TEI *dictionaries* module. + +# Context of the study + +## CollEx-Persée Project DISCO-LGE + +The project ([https://www.collexpersee.eu/projet/disco-lge/](https://www.collexpersee.eu/projet/disco-lge/)) set out to study *La Grande Encyclopédie, Inventaire raisonné des Sciences, des -Lettres et des Arts par une Société de savants et de gens de lettres*, an -encyclopedia published in France between 1885 and 1902 by an organised team of -over two hundred specialists divided into eleven sections. This text comprises -31 tomes of about 1200 pages each and according to @jacquet-pfau2015 [, pp. 88 et -seq.] was the last major french encyclopedic endeavour directly inheriting from -the prestigious ancestor that was the *Encyclopédie ou Dictionnaire raisonné des -sciences des arts et des métiers* published by Diderot and d'Alembert 130 years -earlier, between 1751 and 1772. +Lettres et des Arts par une Société de savants et de gens de lettres* (hence +*LGE*), an encyclopedia published in France between 1885 and 1902 by an +organised team of over two hundred specialists divided into eleven sections. +This text comprises 31 tomes of about 1200 pages each and according to +@jacquet-pfau2015 [, pp. 88 et seq.] was the last major french encyclopedic +endeavour directly inheriting from the prestigious ancestor that was the *EDdA* +published by Diderot and d'Alembert 130 years earlier, between 1751 and 1772. The aim of the project was to digitise and make *La Grande Encyclopédie* available to the scientific community as well as the general public. A previous @@ -136,8 +129,8 @@ pictures with an Optical Characters Recognition (OCR) system. This prevented an exhaustive study of the text with textometry tools such as TXM [@heiden2010]. As a prelude to project GEODE ([https://geode-project.github.io/](https://geode-project.github.io/)), the goal -of CollEx-Persée was to produce a digital version of *La Grande Encyclopédie* -with a quality comparable to the one of l'*Encyclopédie* provided by the ARTFL +of CollEx-Persée was to produce a digital version of *LGE* with a quality +comparable to the one of l'*Encyclopédie* provided by the ARTFL ([http://artfl-project.uchicago.edu/](http://artfl-project.uchicago.edu/)) project in order to conduct a diachronic study of both encyclopedias.