Way too much has been improved without the security of versioning

3020cab8 · Alice Brenon · 06f50a72 · 3020cab8
Commit 3020cab8 authored 1 year ago by Alice Brenon
--- a/ICHLL_Brenon.md
+++ b/ICHLL_Brenon.md
@@ -28,32 +28,89 @@ header-includes:
 {\small \textsuperscript{2} Univ Lyon, INSA Lyon, CNRS, UCBL, LIRIS, UMR5205, F-69621}\\
 \end{center}
-**Abstract** As witnesses to scientific progress, dictionaries and encyclopedias
+**Abstract** This chapter illustrates the fundamental differences between
-draw much interest from digital humanities, which accounts for the number of
+dictionaries and encyclopedias by documenting the process of devising an
-projects making them available to the public or studying them. However, the
+encoding scheme and applying it to a late-19^th^ century encyclopedia, "La
-volume of data involved issues a technical challenge to the digitizing process
+Grande Encyclopédie" (hence *LGE*). The effort, made in the context of project
-required for the study of historical dictionaries. The goal of project DISCO-LGE
+DISCO-LGE, consisted in working from an OCRised version of the pages in XML-ALTO
-was to study a late-19^th^ century encyclopedia, "La Grande Encyclopédie",
+to produce a fully XML-TEI-compliant encoding of the individual articles.
-working from an OCRised version in XML-ALTO up to an encoding suitable for an
+Although the TEI guidelines include a specialised module for dictionaries which
-automatic tool to represent and structure the text of encyclopedias. XML-TEI, a
+was identified as a promising tool for the task, systematic traversal of the
-major standard, includes a specialised module for dictionaries which was
-identified as a good candidate to build on, but systematic traversal of the
 schema using graph search methods revealed some limitations when used to encode
-this text. These shortcomings are described which leads to the identification of
+this text. These shortcomings are reviewed and illustrated on a series of
-the fundamental differences that prevent encoding encyclopedias with the XML-TEI
+examples. An alternative encoding remaining within the *core* module of TEI is
-module for dictionaries. Alternative encodings for encyclopedias including a
+then proposed and demonstrated on articles from *LGE* containing key features.
-fully XML-TEI-compliant scheme are then proposed along with a discussion of
+Finally, different strategies followed by other projects are discussed.
-their advantages and drawbacks.  .
 **Keywords** digital humanities, XML-TEI, dictionaries, encyclopedias
+# Introduction
+Although both terms have been used rather interchangeably over the past few
+centuries, a dichotomy is now commonly being made between dictionaries and
+encyclopedias. A simple oppositon can easily justify this distinction:
+dictionaries define words and tell one how to use them while encyclopedia
+usually go into longer development to give a more comprehensive and scientific
+understanding of the concept being defined. This common intuition links back to
+the entry written in the *Encyclopédie ou Dictionnaire raisonné des sciences des
+arts et des métiers* (hence *EDdA*) by @dalembert_dictionnaire_2022 [article
+DICTIONNAIRE, volume 4] who opposes three kinds of dictionaries: one to define
+*words*, the second to define *facts* and the last one to define *things*,
+corresponding respectively to language, history, and science and arts
+dictionaries. The first type corresponds to our modern dictionaries while the
+two others are similar to what one expects to find in an encyclopedia.
+However, d'Alembert himself doesn't think of these boundaries as absolute and he
+hints at the extreme difficulty in merely defining words without going into
+semantics and philosophical considerations:
+> un dictionnaire de langues, qui paroît n'être qu'un dictionnaire de mots, doit
+> être souvent un dictionnaire de choses quand il est bien fait
+(*a language dictionary, which appears to be only a word dictionary, must often
+be a thing dictionary when it is made properly*). A similar criticism is made by
+@haiman_dictionaries_1980 who attacks no less than six criteria on which
+dictionaries and encyclopedias are generally opposed to reach the conclusion
+that there is no distinction between them because "dictionaries *are*
+encyclopedias". Regardless of the validity of his reasoning, it only proves one
+inclusion: that perhaps, dictionaries would be a special case of encyclopedias.
+This, as will be evidenced, does by no means imply that encyclopedias are
+dictionaries.
+XML-TEI is a set of guidelines collectively developped by the
+@tei_consortium_tei_2023 under the form of XML schemas, along with a range of
+tools to handle them and training resources in order to represent text in a
+highly structured and machine-readable format. Its toolbox has a modular
+structure consisting of optional parts each covering specific needs such as the
+physical features of a source document, the transcription of oral corpora or
+particular requirements for textual domains like poetry, or, in the case at
+hand, dictionaries.
+After describing why the dedicated
+module was a natural candidate to consider, I formalise tools from graph
+theory to browse the specifications of this guideline in a rational way and
+explore this module in detail.
+@romary_formal_2007
+(@ide_encoding_1995 *dictionaries* only for western dictionaries) have been
+applied for both historical (@bohbot2018) and digitally native
+(@bowers_bridging_2018). In addition, a specific guidelines tailored at encoding
+dictionaries, TEI-Lex0, has been published [@banski_tei_lex0_2017].
+Systematic study of the guidelines @ide_background_1998 but here's a new method.
+Less than ten years after the beginings of the TEI, @ide_background_1998 gives a
+thorough account of the criteria
 # Dictionaries and encyclopedias
-After emerging from dictionaries during the 18^th^ century, encyclopedias became
+After emerging over the course of the 18^th^ century, encyclopedias became a
-a fertile subgenre in themselves and a rich subject of study to digital
+fertile subgenre in themselves and a rich subject of study to digital humanities
-humanities for their particular relation to knowledge and its evolution. In this
+for their particular relation to knowledge and its evolution. This section
-section we will describe the goal of our project, then look at the origin of the
+describes the goal of the project, then looks at the origin of the term
-term "encyclopedia" itself before comparing the approaches of encyclopedias and
+"encyclopedia" itself before comparing the approaches of encyclopedias and
 dictionaries.
 ## Context of the project
@@ -91,9 +148,9 @@ near synonyms to refer to books compiling vast amounts of knowledge into lists
 of definitions ordered alphabetically. Their similarity is even visible in the
 way they are coordinated in the full title of the *Encyclopédie* which is
 probably the most famous work of the genre and a symbol of the Age of
-Enlightenment. If the word "encyclopedia" is nowadays part of our vocabulary, it
+Enlightenment. If the word "encyclopedia" is nowadays part of everyday
-was much more unusual and in fact controversial when Diderot and d'Alembert
+vocabulary, it was much more unusual and in fact controversial when Diderot and
-decided to use it in the title of their book.
+d'Alembert decided to use it in the title of their book.
 The definition given by Furetière in his *Dictionnaire Universel* in 1690 is
 still close to its greek etymology: a "ring of all knowledges", from *κύκλος*,
@@ -171,6 +228,8 @@ and remain within the linguistic level of things. Entries in a dictionary often
 feature information such as the part of speech, the pronunciation or the
 etymology of the word they define.
+# <FIXME
 The entry for "Dictionnaire" in the *Encyclopédie* distinguishes between three
 types of dictionaries: one to define *words*, the second to define *facts* and
 the last one to define *things*, corresponding to the distinction between
@@ -181,36 +240,44 @@ means of the coordinating conjunction "ou" to a *Dictionnaire raisonné*,
 "reasoned dictionary", introducing the idea of encyclopedias as dictionaries
 with additional structure and a philosophical dimension.
-Back to the "Encyclopédie" article we read that a dictionary remaining strictly
+# FIXME>
-at the language level, a vocabulary, can be seen as the empty frame required for
-an encyclopedic dictionary that will fill it with additional depth. Given how
+Back to the "Encyclopédie" article one can read that a dictionary remaining
-d'Alembert insists on the importance of brevity for a clear definition in the
+strictly at the language level, a vocabulary, can be seen as the empty frame
-"Dictionnaire de Langues" entry, it is clear that the *encyclopédistes* did not
+required for an encyclopedic dictionary that will fill it with additional depth.
-consider encyclopedias superior to dictionaries but really as a new subgenre
+Given how d'Alembert insists on the importance of brevity for a clear definition
-departing from them in terms of purpose.
+in the "Dictionnaire de Langues" entry, it is clear that the *encyclopédistes*
+did not consider encyclopedias superior to dictionaries but really as a new
+subgenre departing from them in terms of purpose.
 # The *dictionaries* TEI module {#sec:dictionaries-module}
-The XML-TEI standard has a modular structure consisting of optional parts each
+# <FIXME
+The XML-TEI toolbox has a modular structure consisting of optional parts each
 covering specific needs such as the physical features of a source document, the
 transcription of oral corpora or particular requirements for textual domains
-like poetry, or, in our case, dictionaries. After describing why the dedicated
+like poetry, or, in the case at hand, dictionaries. After describing why the dedicated
-module was a natural candidate to meet our needs, we formalise tools from
+module was a natural candidate to consider, I formalise tools from graph
-graph theory to browse the specifications of this standard in a rational way and
+theory to browse the specifications of this guideline in a rational way and
 explore this module in detail.
+# FIXME>
-## A good starting point
+## A good starting point {#sec:starting-point}
 Data produced in the context of a project such as DISCO-LGE cannot be useful to
 future scientific projects unless it is *interoperable* and *reusable*. These
 are the two last key aspects of the FAIR
 ([https://www.go-fair.org/fair-principles/](https://www.go-fair.org/fair-principles/)) principles (*findability*,
-*accessibility*, *interoperability* and *reusability*) which we strive to follow
+*accessibility*, *interoperability* and *reusability*) which I strive to follow
-as a guideline for efficient and quality research. It entails using standard
+as a guideline for efficient and quality research.
+# <FIXME
+It entails using standard
 formats and a standard for encoding historical texts in the context of digital
 humanities is XML-TEI, collectively developped by the *Text Encoding Initiative*
 consortium which publishes a set of technical specifications under the form of
 XML schemas, along with a range of tools to handle them and training resources.
+# FIXME>
 The *dictionaries* module has been leveraged to encode dictionaries in projects
 NENUFAR
@@ -218,28 +285,30 @@ NENUFAR
 and BASNUM
 ([https://anr.fr/Projet-ANR-18-CE38-0003](https://anr.fr/Projet-ANR-18-CE38-0003))
 to encode respectively the *Petit Larousse Illustré* published by Pierre
-Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to our target encyclopedia
+Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE*
-and the *Dictionnaire Universel* by Furetière, or rather its second edition
+*Dictionnaire Universel* by Furetière, or rather its second edition edited by
-edited by Henri Basnage de Beauval, an encyclopedic dictionary from the very
+Henri Basnage de Beauval, an encyclopedic dictionary from the very early 18^th^
-early 18^th^ century [@williams2017, p. 1]. These successes made it a good starting
+century [@williams2017, p. 1]. These successes suggested it to be a useful tool
-point for our own encoding but the former does not have the encyclopedic
+to encode encyclopedias but a few differences remained between both projects and
-dimension our corpus has and the latter is a much older text which had a
+DISCO-LGE: the text studied by NENUFAR does not have the encyclopedic dimension
-tremendous influence on the european encyclopedic effort of the 18^th^ century
+*LGE* has and BASNUM studies a much older text which had a tremendous influence on the
-but is not as clearly separated from the dictionaric stem as *La Grande
+european encyclopedic effort of the 18^th^ century but is not as clearly
-Encyclopédie* is. For these reasons, we could not directly reuse the encoding
+separated from the dictionaric stem as *La Grande Encyclopédie* is.  For these
-schemes used in these projects and had to explore the XML-TEI schema
+reasons, the encoding schemes used in these projects could not be reused
-systematically to devise our own.
+directly, prompting for a systematic exploration of the XML-TEI schema to devise
+a new one.
-In this chapter, we need to name and manipulate XML elements. We choose to
-represent them in a monospace font, in the standard XML autoclosing form within
+This chapter discusses XML elements in depth and hence needs to name and
-angle brackets and with a slash following the element name like `<div/>` for a
+manipulate them. They will be represented in a monospace font, in the standard
-`div` element
+XML autoclosing form within angle brackets and with a slash following the
-([https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html](https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html)). We do not mean by this notation that they cannot contain
+element name like `<div/>` for a `div` element
-raw text or other XML elements, merely that we are referring to such an element,
+([https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html](https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html)).
-with all the subtree that spans from it in the context of a concrete document
+This notation does not mean to imply that they cannot contain raw text or other
-instance or as an empty structure when we are considering the abstract element
+XML elements, it merely denotes such an element, without any additional
-and the rules that govern its use in relation to other elements or its
+assumption. In the context of a concrete document instance this can refer to the
-attributes.
+markup with all the subtree that possibly spans from it, but the same notation
+will be used when considering the abstract element and the rules that govern its
+use in relation to other elements or its attributes.
 ## A graph problem
@@ -249,26 +318,27 @@ almost 80 possible child elements (79.91) within any given element, manually
 browsing such an massive network can prove quite difficult as the number of
 combinations sharply increases with each step.
-We transform the problem by representing this network as a directed graph, using
+The problem can be advantageously transformed by representing this network as a
-elements of XML-TEI as nodes and placing edges if the destination node may be
+directed graph, using elements of XML-TEI as nodes and placing edges if the
-contained within the source node according to the schema. Please note that the
+destination node may be contained within the source node according to the
-word "element" is here used with the same meaning as in the TEI documentation to
+schema. Please note that the word "element" is here used with the same meaning
-refer to the conceptual device characterised by a given tag name such as `p` or
+as in the TEI documentation to refer to the conceptual device characterised by a
-`div` and not to a particular instance of them that may occur in a given
+given tag name such as `p` or `div` and not to a particular instance of them
-document. Figure @fig:dictionaries-subgraph, by using this transformation to
+that may occur in a given document. Figure @fig:dictionaries-subgraph, by using
-display the *dictionaries* module, hints at the overall complexity of the whole
+this transformation to display the *dictionaries* module, hints at the overall
-specification.
+complexity of the whole specification.
 ![The subgraph of the *dictionaries* module](ressources/dictionaries.png){height=830px #fig:dictionaries-subgraph}
 By iterating several times the operation of moving on that graph along one edge,
 that is, by considering the transitive closure of the relation "be connected by
-an edge" we define *inclusion paths* which allow us to explore which elements
+an edge" one defines *inclusion paths*, allowing to explore which elements may
-may be nested under which other.
+be nested under which other.
 The nodes visited along the way represent the intermediate XML elements to
 construct a valid XML tree according to the TEI schema. Given the top-down
-semantics of those trees, we call the length of an inclusion path its *depth*.
+semantics of those trees, the length of an inclusion path will be called its
+*depth*.
 The ability for an element to contain itself corresponds directly to loops on
 the graph (that is an edge from a node to itself) as can be illustrated by the
@@ -276,17 +346,17 @@ the graph (that is an edge from a node to itself) as can be illustrated by the
 another one.
 The generalisation of this to inclusion paths of any length greater than one is
-usually called a cycle and we may be tempted in our context to refine this and
+usually called a cycle and it appears natural to refine this and name them
-name them *inclusion cycles*. The `<address/>` element provides us with an
+*inclusion cycles*. The `<address/>` element provides an example for this
-example for this configuration: although an `<address/>` element may not
+configuration: although an `<address/>` element may not directly contain another
-directly contain another one, it may contain a `<geogName/>` which, in turn, may
+one, it may contain a `<geogName/>` which, in turn, may contain a new
-contain a new `<address/>` element. From a graph theory perspective, we can say
+`<address/>` element. From a graph theory perspective, one can say that it
-that it admits an inclusion cycle of length two.
+admits an inclusion cycle of length two.
 Using classical, well-known methods such as Dĳkstra's algorithm [@dĳkstra59]
-allows us to explore the shortest inclusion paths that exist between elements.
+lets one explore the shortest inclusion paths that exist between elements.
 Though a particular caution should be applied because there is no guarantee that
-the shortest path is meaningful in general, it at least provides us with an
+the shortest path is meaningful in general, it at least provides an
 efficient way to check whether a given element may or not be nested at all under
 another one and gives a lower bound on the length of the path to expect. Of
 course the accuracy of this heuristic decreases as the length of the elements
@@ -297,7 +367,7 @@ This is still very useful when taking into account the fact that TEI modules are
 merely "bags" to group the elements and provide hints to human encoders about
 the tools they might need but have no implication on the inclusion paths between
 elements which cross module boundaries freely. The general graph formalism
-enables us to describe complex filtering patterns and to implement queries to
+enables one to describe complex filtering patterns and to implement queries to
 look for them among the elements exhaustively by algorithmic means even when the
 shortest-path approach is not enough.
@@ -316,12 +386,12 @@ A last relevant example on the use of these methods can be given by querying the
 shortest inclusion path of a `<pos/>` under the `<body/>` of the document: it
 yields an inclusion directly through `<entryFree/>` (with an inclusion path of
 length 2), which unlike `<entry/>` accepts it as a direct child node. Possibly
-not what we want depending on the regularity of the articles we are encoding and
+not what is wanted depending on the regularity of the articles being encoded and
 the occurrence of other grammatical information such as `<case/>` or `<gen/>` to
 justify the use of the `<gramGrp/>`, but searching exhaustively for paths up to
-length 3 returns as expected the path through `<entry/>`, among others. Overall,
+length 3 returns as expected the path through `<entry/>`, among others. The big
-we get a good general idea: `<pos/>` does not need to be nested very deep, it
+picture starts to appear: `<pos/>` does not need to be nested very deep, it can
-can appear quite near the "surface" of article entries.
+appear quite near the "surface" of article entries.
 ## Content of the module
@@ -333,15 +403,15 @@ element to the dictionary module: indeed, although `<body/>` may also contain
 `<entry/>` while the latter is a device to group several related entries
 together. Both can contain an `<entry/` directly while no obvious inclusion
 exists the other way around: most (> 96.2%) of the inclusion paths of
-"reasonable" depth (which we define as strictly inferior to 5, that is twice the
+"reasonable" depth (which will be arbitrarily defined as strictly inferior to 5,
-average shortest depth between any two nodes) either include `<figure/>` or
+that is twice the average shortest depth between any two nodes) either include
-`<castList/>`, two very specific elements which should not need to appear in an
+`<figure/>` or `<castList/>`, two very specific elements which should not need
-article in general, showing that the purpose of `<entry/>` is not to contain an
+to appear in an article in general, showing that the purpose of `<entry/>` is
-`<entryFree/>` or `<superEntry/>`. Hence, not only the semantics conveyed by the
+not to contain an `<entryFree/>` or `<superEntry/>`. Hence, not only the
-documentation but also the structure of the elements graph evidence `<entry/>`
+semantics conveyed by the documentation but also the structure of the elements
-as the natural top-most element for an article. This somewhat contrived example
+graph evidence `<entry/>` as the natural top-most element for an article. This
-hopes to further demonstrate the application of a graph-centred approach to
+somewhat contrived example hopes to further demonstrate the application of a
-understand the inner workings of the XML-TEI schema.
+graph-centred approach to understand the inner workings of the XML-TEI schema.
 Once a block for an article is created, it may contain elements useful to
 represent various of its features. Its written and spoken forms are usually
@@ -370,7 +440,7 @@ redirection, with an imperative locution like "please see […]".
 The "active" part of the cross-reference, that is the very word within the
 `<xr/>` that is considered to be the link or, to make a modern-day HTML
 metaphor, the region that would be clickable, is represented by a `<ref/>`
-element. Though it is not specific to the *dictionaries* module, we include it
+element. Though it is not specific to the *dictionaries* module, it is included
 in this description of the toolbox because it is particularly useful in the
 context of dictionaries. This element may have a target attribute which points
 to the other resource to be accessed by the interested reader.
@@ -387,7 +457,7 @@ under the `<entry/>`.
 Before concluding this description of the *dictionaries* module from the
 perspective of someone trying to concretely encode a particular dictionary or
-encyclopedia, we make use of the graph approach again to evidence some its
+encyclopedia, the graph approach is again leveraged to evidence some of its
 aspects in terms of inclusion structure.
 First, it is remarkable that all elements in the *dictionaries* module have a
@@ -405,25 +475,25 @@ official documentation. Among those (shortest) cycles, 20 include the `<cit/>`
 element made to group quotations with a bibliographic reference to their source
 which should clearly be unnecessary to encode an article in the general case.
-Secondly, although we have seen examples of connections from this module to the
+Secondly, although examples of connections from this module to the rest of the
-rest of the XML-TEI, especially to the *core* module (to which belongs for
+XML-TEI have been evidenced in this section, especially to the *core* module (to
-example the `<ref/>` element), the *dictionaries* module appears somewhat
+which belongs for example the `<ref/>` element), the *dictionaries* module
-isolated from important structural elements like `<head/>` or `<div/>`. Indeed,
+appears somewhat isolated from important structural elements like `<head/>` or
-computing all the paths from either `<entry/>` or `<sense/>` elements to the
+`<div/>`. Indeed, computing all the paths from either `<entry/>` or `<sense/>`
-latter of length shorter or equal to 5 by a systematic traversal of the graph
+elements to the latter of length shorter or equal to 5 by a systematic traversal
-yields exclusively paths (respectively 9042 and 39093 of them) containing either
+of the graph yields exclusively paths (respectively 9042 and 39093 of them)
-a `<floatingText/>` or an `<app/>` element. The first one, as its name aptly
+containing either a `<floatingText/>` or an `<app/>` element. The first one, as
-suggests, is used to encode text that does not quite fit the regular flow of the
+its name aptly suggests, is used to encode text that does not quite fit the
-document, as for example in the context of an embedded narrative. Both examples
+regular flow of the document, as for example in the context of an embedded
-displayed in the online documentation feature a `<body/>` as direct child of
+narrative. Both examples displayed in the online documentation feature a
-`<floatingText/>`, neatly separating its content as independent. The purpose of
+`<body/>` as direct child of `<floatingText/>`, neatly separating its content as
-the second one, although its name — short for apparatus — is less clear, is to
+independent. The purpose of the second one, although its name — short for
-wrap together several versions of the same excerpts, for instance when there are
+apparatus — is less clear, is to wrap together several versions of the same
-several possible readings of an unclear group of words in a manuscript, or when
+excerpts, for instance when there are several possible readings of an unclear
-the encoder is trying to compile a single version of a piece of work from
+group of words in a manuscript, or when the encoder is trying to compile a
-several sources which disagree over some passage. In both case, it appears
+single version of a piece of work from several sources which disagree over some
-obvious that it is not something that is expected to occur naturally in the
+passage. In both case, it appears obvious that it is not something that is
-course of an article in general.
+expected to occur naturally in the course of an article in general.
 Thus, despite a rather dense internal connectivity, the *dictionaries* module
 fails to provide encoders with a device to represent recursively nesting
@@ -432,21 +502,21 @@ structures like `<div/>`.
 # A new standard ?
 Studying the content of *La Grande Encyclopédie* and considering several
-articles in particular, we identify structures which are specific to
+articles in particular, one can identify structures which are specific to
 encyclopedias and not compatible with the *dictionaries* module presented in the
-previous section. We hence conclude that this module is not able to encode
+previous section. It follows that this module is not able to encode arbitrary
-arbitrary encyclopedic content and propose a new fully TEI-compliant encoding
+encyclopedic content and propose a new fully TEI-compliant encoding scheme
-scheme remaining outside of it. We proceed with remarks about the needs of
+remaining outside of it. The rest of the section is concerned with the needs of
-automated encoding processes and compare our proposal with other strategies to
+automated encoding processes and compares the proposal with other strategies to
 overcome the issues previously identified with the dedicated module for
 dictionaries.
 ## Idiosynchrasies of encyclopedias
 Browsing through the pages of an encyclopedia reveals a certain number of
-noticeable differences. It is difficult to make a precise list because the
+noticeable differences. A comprehensive list would be difficult to draw because
-editorial choices may vary greatly between encyclopedias but we discuss some of
+of the great variety in terms of editorial choices the most obvious can be
-the most obvious.
+discussed.
 The first immediately visible feature that sets encyclopedias apart from
 dictionaries and can be found in the *Encyclopédie* as well as in *La Grande
@@ -456,24 +526,24 @@ system. Those generally cover a broad range of subjects from scientific
 disciplines to litterature, and extending to political subjects and law.
 No element in the *dictionaries* module is explicitely designed for the purpose
-of encoding these indicators. As we have seen, the elements set is geared
+of encoding these indicators. As section @sec:dictionaries-module illustrates,
-towards the words themselves instead of the concept they represent. The closest
+the elements set is geared towards the words themselves instead of the concept
-tool for what we need is found in the `<usg/>` element used with a specific
+they represent. The tool closest to what is needed can be found in the `<usg/>`
-`type` attribute set to `dom` for "domain". Indeed several examples from the
+element used with a specific `type` attribute set to `dom` for "domain". Indeed
-documentation encode subject indicators very similar to the ones found in
+several examples from the documentation encode subject indicators very similar
-encyclopedias within this element, but the match is not perfect either: all
+to the ones found in encyclopedias within this element, but the match is not
-appear within one of multiple senses, as if to clarify each context in which the
+perfect either: all appear within one of multiple senses, as if to clarify each
-word can be used, as expected from the element's name, "usage". In
+context in which the word can be used, as expected from the element's name,
-encyclopedias, if the domain indicator does in certain cases help to distinguish
+"usage". In encyclopedias, if the domain indicator does in certain cases help to
-between several entries sharing the same headword, the concept itself has
+distinguish between several entries sharing the same headword, the concept
-evolved beyond this mere distinction. Looking back at the *Encyclopédie*, the
+itself has evolved beyond this mere distinction. Looking back at the
-adjective *raisonné* in the rest of the title directly introduces a notion of
+*Encyclopédie*, the adjective *raisonné* in the rest of the title directly
-structure that links back to the "Systême figuré des connoissances humaines"
+introduces a notion of structure that links back to the "Systême figuré des
-[@blanchard2002, p. 1] which schematic structure is shown in Figure
+connoissances humaines" [@blanchard2002, p. 1] which schematic structure is
-@fig:systeme-figure. The authors have devised a branching system to classify all
+shown in Figure @fig:systeme-figure. The authors have devised a branching system
-knowledge, and the occurrence at the beginning of articles, more than a tool to
+to classify all knowledge, and the occurrence at the beginning of articles, more
-clear up possible ambiguities also points the reader to the correct place in
+than a tool to clear up possible ambiguities also points the reader to the
-this mind map.
+correct place in this mind map.
 !["Systême figuré des connoissances humaines", the taxonomy at the heart of the Encyclopédie ([Wikimedia Commons](https://commons.wikimedia.org/wiki/File:ENC_SYSTEME_FIGURE.jpeg?uselang=fr#filelinks))](ressources/arbre.png){width=300px #fig:systeme-figure}
@@ -537,17 +607,17 @@ which are in turn generally developed over several paragraphs.
 ![La Grande Encyclopédie, tome 16, article "Europe", spanning from p.782 to p.846, that is 64 pages, and ending after a bibliography longer than one column of text ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/europe_t16.png){#fig:europe}
-The nested structure that we have just evidenced demands of course a nesting
+The nested structure that have just been evidenced demands of course a nesting
-structure to accomodate it. More precisely it guides our search of XML elements
+structure to accomodate it. More precisely, it guides the search of XML elements
-by giving us several constraints: we are looking for a pair of elements, the
+by adding several constraints: what is required is a pair of elements. The first
-first representing a (sub)section must be able to include both itself and the
+one representing a (sub)section must be able to include both itself and the
-second element, which does not have any special constraint except the one to
+second one, which does not have any special constraint except the one to have a
-have a semantics compatible with our purpose of using it to represent section
+semantics compatible with the purpose of being used to represent section titles.
-titles. In addition, the first element must be able to contain several `<p/>`
+In addition, the first element must be able to contain several `<p/>` elements,
-elements, `<p/>` being the reference element to encode paragraphs according to
+`<p/>` being the reference element to encode paragraphs according to the XML-TEI
-the XML-TEI documentation.
+documentation.
-We have seen that the *dictionaries* module was equiped with a questionable but
+The *dictionaries* module has been shown to be equiped with a questionable but
 possible element for subject domains. However, it does not include any element
 for section titles. In the rest of the TEI specification, the elements `<head/>`
 and `<title/>` — the latter with the possibility to set its `type` attribute to
@@ -562,41 +632,42 @@ article with an `<entryFree/>`, an element supposed to relax some constraint to
 accomodate more unusual structure in dictionaries does not bring any
 improvement.
-The lack of results from these simple queries forces us to somewhat release the
+The lack of results from these simple queries forces one to somewhat release the
-constraints on the encoding we are willing to use. We can for instance make the
+constraints on the encoding one is willing to use. The occurrence of an
-asumption that the occurrence of an intermediate element could be needed between
+intermediate element could for instance be needed between the element wrapping
-the element wrapping the whole article and the recursing one used to encode each
+the whole article and the recursing one used to encode each section. This
-section. This "section" element could also need a companion element to be able
+"section" element could also need a companion element to be able to include
-to include itself, or, to formalise it in terms of graph theory, we could relax
+itself, or, to formalise it in terms of graph theory, the condition that this
-the condition that this element admits a loop to consider instead cycles of a
+element admits a loop could be relaxed to consider instead cycles of a given
-given (small, this still needs to represent a fairly direct inclusion) length to
+(small, this still needs to represent a fairly direct inclusion) length to be
-be enough. We simultaneously extend the maximum depth of the inclusion paths we
+enough. Simultaneously the maximum depth of the inclusion paths between
-are looking for between `<entry/>`, the pair of elements and the `<p/>` element.
+`<entry/>`, the pair of elements and the `<p/>` element will be increased to
+yield more results.
 By setting this depth to 3, that is, by accepting one intermediate element to
 occur in the middle of each one of the inclusion paths that define the structure
-required to encode encyclopedic discourse, we find 21 elements but none of them
+required to encode encyclopedic discourse, 21 elements can be found, none of
-stand out as an obvious good solution: all paths to include the `<p/>` element
+which stands out as an obvious good solution: all paths to include the `<p/>`
-from any *dictionaries* element either contains a `<figure/>` (which we have
+element from any *dictionaries* element either contains a `<figure/>` (already
-encountered earlier when we were practising our graph approach to search for
+discussed in section @sec:dictionaries-module when practising the graph approach
-inclusions between `<entry/>` and `<entryFree/>` and dismissed as not useful in
+to search for inclusions between `<entry/>` and `<entryFree/>` and dismissed as
-general), a `<stage/>` (reserved to stage direction in dramatic works) or a
+not useful in general), a `<stage/>` (reserved to stage direction in dramatic
-`<state/>` (used to describe a temporary quality in a person or place), again
+works) or a `<state/>` (used to describe a temporary quality in a person or
-not even close to what we want. The paths to either `<head/>` or `<title/>` are
+place), again not even close to what is wanted. The paths to either `<head/>` or
-similarly disappointing. Again, changing `<entry/>` for `<entryFree/>` returns
+`<title/>` are similarly disappointing. Again, changing `<entry/>` for
-the exact same candidates. If that is not a thorough proof that none of these
+`<entryFree/>` returns the exact same candidates. If that is not a definite
-elements could fulfill our purpose, it is a fact than no element in this module
+proof that none of these elements could the investigated criteria, it is a fact
-appears as an obvious good solution and a serious hint to keep looking somewhere
+than no element in this module stands out as the obvious good solution and a
-else.
+serious hint to keep looking somewhere else.
-We hence widen our search to include elements outside the *dictionaries* module
+Therefore, the search is extended again to include elements outside the
-which could be used to encode our sections and subsections, under the same
+*dictionaries* module which could be used to encode the sections and
-constraint as before to try and find a composite solution that would remain
+subsections, under the same constraint as before to try and find a composite
-under the `<entry/>` element even if resorting to subcomponents outside of the
+solution that would remain under the `<entry/>` element even if resorting to
-dedicated module. Only three elements are returned: `<figure/>`, `<metamark/>`
+subcomponents outside of the dedicated module. Only three elements are returned:
-and `<note/>`.
+`<figure/>`, `<metamark/>` and `<note/>`.
-The first one as we have repeatedly underlined is meant for graphic information
+The first one as has been repeatedly underlined is meant for graphic information
 and is not suitable for text content in general.
 The purpose of `<metamark/>` is to transcribe the edition marks than may appear
@@ -605,14 +676,14 @@ suggest an alternative reading (deletion, insertion, reordering, this is about a
 human editing the text from a given physical copy of it), but it is
 unfortunately of no use to encode a section of an article.
-The first element that might at least resemble what we are looking for is the
+The first element that might at least seem acceptable is the last one,
-last one, `<note/>`. It is meant to contain text, is about explaning something
+`<note/>`. It is meant to contain text, is about explaning something and seems
-and seems general enough (not specific to a given genre, or to the occurrence of
+general enough (not specific to a given genre, or to the occurrence of a
-a particular object on the page). Unfortunately, its semantics still seems a bit
+particular object on the page). Unfortunately, its semantics still seems a bit
-off compared to our need. The documentation describes it as an "additional
+off compared to what is required. The documentation describes it as an
-comment" which appears "out of the main textual stream" whereas the long
+"additional comment" which appears "out of the main textual stream" whereas the
-developments in articles are the very matter of the text of encyclopedias, not
+long developments in articles are the very matter of the text of encyclopedias,
-mere remarks in the margins or at the foot of pages.
+not mere remarks in the margins or at the foot of pages.
 ## Encoding within the *core* module {#sec:core-module}
@@ -620,63 +691,75 @@ The remarks made in section @sec:dictionaries-module explain why the
 *dictionary* module is unable to represent encyclopedias, where the notion of
 "meaning" is less central that in dictionaries and where discourse with nested
 structures of arbitrary depth can occur. Even composite encodings using elements
-outside of the *dictionaries* module under an `<entry/>` element do not meet our
+outside of the *dictionaries* module under an `<entry/>` element do not meet the
-requirements. Since the *core* module obviously accomodates these structures by
+requirements of the project. Since the *core* module obviously accomodates these
-means of the `<div/>`, `<head/>` and `<p/>` elements which have the additional
+structures by means of the `<div/>`, `<head/>` and `<p/>` elements which have
-advantage of carrying less semantical payload than `<sense/>` or `<def/>` we
+the additional advantage of carrying less semantical payload than `<sense/>` or
-devise an encoding scheme using them which we recommend using for other projects
+`<def/>`, these elements will be used to devise an encoding scheme which can be
-aiming at representing encyclopedias.
+recommended for other projects aiming at representing encyclopedias.
-To remain consistent with the way we studied the *dictionaries* module we will
+To remain consistent with the way the *dictionaries* module was studied only
-only concern ourselves with what happens at the level of each article, right
+what happens at level of each individual article will be considered, that is
-under the `<body/>` element.  Everything related to metadata happens as expected
+right under the `<body/>` element representing a whole volume. Everything
-in the file's `<teiHeader/>` which is well-enough equiped to handle them. In
+related to its metadata happens as expected in the file's `<teiHeader/>` which
-order to present our scheme throughout the following section we will be
+is well-enough equiped to handle them. In order to present the scheme throughout
-progressively encoding a reference article, "Cathète" from tome 9 reproduced in
+the following section a reference article, "Cathète" from tome 9 — reproduced in
-Figure @fig:cathete-photo.
+Figure @fig:cathete-photo — will be progressively encoding.
 ![La Grande Encyclopédie, tome 9, article "Cathète" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/cathète_t9.png){#fig:cathete-photo}
 Remaining within the *core* module for the structure, almost all useful elements
-are available and our encoding scheme merely quotes the official documentation.
+are available and practically no additional documentation is needed beyond the
-Each article is represented by a `<div/>`. We suggest setting an `xml:id`
+official TEI guidelines. Each article is represented by a `<div/>`. Setting an
-attribute on it with the head word of the entry — unique in the whole corpus, or
+`xml:id` attribute on it with a unique value will ease identify, browse and
-made so by suffixing a number representing its rank among the various
+retrieve the articles from the encoded corpus. An auto-increasing serial would
-occurrences, even when there's only one for the sake of regularity — as its
+of course provide an appropriate value for such a unique attribute but has some
-value, normalised to lowercase, stripping spaces and replacing all
+drawbacks: as long as the articles segmentation isn't fixed (which could happen
+if choices regarding entries and sub-entries were to change along a project or
+if, as is the case of DISCO-LGE, the automatic segmentation went through
+successive improvement steps), the identifiers of articles would massively
+change from one version to the other, even articles segmented correctly. Given
+the iterative nature of many studies in digital humanities, this would make it
+harder to use results found early in a project. For this reason, the values used
+for `xml:id` in project DISCO-LGE depend only on the local quality of the
+segmentation and remain globally stable. They are computed as the head word of
+the entries normalised to lowercase, stripping spaces and replacing all
 non-alphanumerical characters by a dash (`'-'`) to avoid issues with the XML
-encoding. Figure @fig:cathete-xml-0 illustrates this choice for the container
+encoding, and suffixed by a serial to distinguish between the few entries
-element on the article "Cathète" previously displayed.
+sharing the same head. Thus, if an oversegmentation or a subsegmentation are
+fixed (meaning respectively that two "articles" get fusioned or that one
+"article" actually contained several which get split as such) only articles with
+the same headword are impacted. Figure @fig:cathete-xml-0 illustrates this
+choice for the container element on the article "Cathète" previously displayed.
 ![The container `div` element for article "Cathète"](snippets/cathète_0.png){#fig:cathete-xml-0}
 Inside this element should be a `<head/>` enclosing the headword of the article.
 The usual sub-`<hi/>` elements are available within `<head/>` if the headword is
 highlighted by any special typographic means such as bold, small capitals, etc.
-The one disappointment of the encoding scheme we are defining in this chapter is
+The one disappointment of the encoding scheme being defined in this chapter is
 the lack of support for a proper way to encode subject indicators.
-The best candidate we have found so far was `<usg/>` from the *dictionaries*
+The best candidate found so far was `<usg/>` from the *dictionaries* module but
-module but it is not available directly under a `<head/>` element. All inclusion
+it is not available directly under a `<head/>` element. All inclusion paths from
-paths from the latter to the former of length less than or equal to 3 contain
+the latter to the former of length less than or equal to 3 contain irrelevant
-irrelevant elements (`<cit/>`, `<figure/>`, `<castList/>` and `<nym/>`) so it
+elements (`<cit/>`, `<figure/>`, `<castList/>` and `<nym/>`) so it must be
-must be discarded. The next best elements appear to be `<term/>` (not very
+discarded. The next best elements appear to be `<term/>` (not very accurate) and
-accurate) and `<rs/>` ("referring string", quite a general semantics but a
+`<rs/>` ("referring string", quite a general semantics but a possible match —
-possible match — subject indicators refer to a given domain of knowledge —
+subject indicators refer to a given domain of knowledge — although all the
-although all the examples in the documentation refer to concrete persons,
+examples in the documentation refer to concrete persons, places or object, not
-places or object, not to the abstract objects that mathematics or poetry are).
+to the abstract objects that mathematics or poetry are).
-For this reason, we do not recommend any special encoding of the subject
+For this reason, no particular encoding of the subject indicator is recommended
-indicator but leave it open to each particular context: they are often
+and it is left open to each particular context: they are often abbreviated so an
-abbreviated so an `<abbr/>` may apply, in *La Grande Encyclopédie*, biographies
+`<abbr/>` may apply, in *La Grande Encyclopédie*, biographies are not labeled by
-are not labeled by a knowledge domain but usually include the first name of the
+a knowledge domain but usually include the first name of the person when it is
-person when it is known so in that case an element like `<persName/>` is still
+known so in that case an element like `<persName/>` is still appropriate. This
-appropriate. This choice applied to the same article "Cathète" produces Figure
+choice applied to the same article "Cathète" produces Figure @fig:cathete-xml-1.
-@fig:cathete-xml-1.
 ![Encoding the head word of article "Cathète"](snippets/cathète_1.png){#fig:cathete-xml-1}
-We then propose to wrap each different meaning in a separate `<div/>` with the
+Each different meaning could then be wrapped in a separate `<div/>` with the
 `type` attribute set to `sense` to refer to the `<sense/>` element that would
 have been used within the *core* module. The `<div/>`s should be numbered
 according to the order they appear in with the `n` attribute starting from `0`
@@ -711,16 +794,16 @@ Figure @fig:boumerang-photo, which should be encoded the standard way by
 ![Encoding the figure in article "Boumerang" and its captions](snippets/boumerang.png){#fig:boumerang-xml}
 Another issue arising from giving up on `<entry/>` is the unavailability of the
-`<xr/>` element, not allowed under any of the *core* elements we use but which
+`<xr/>` element, not allowed under any of the *core* elements used but which is
-is useful to represent cross-references occurring in encyclopedias as well as in
+useful to represent cross-references occurring in encyclopedias as well as in
 dictionaries, for example in article "Gelocus" (see Figure @fig:gelocus-photo).
-We prefer to use the `<ref/>` element instead which is available in the context
+It is prefered to use the `<ref/>` element instead which is available in the
-of a `<p/>`. Its `target` attribute should be set to the `xml:id` of the
+context of a `<p/>`. Its `target` attribute should be set to the `xml:id` of the
 article it points to, prefixed with a `'#'` as shown in Figure @fig:gelocus-xml.
 Another solution would have been to introduce a `<dictScrap/>` element for the
-sole purpose of placing an `<xr/>` but we advocate against it on account of the
+sole purpose of placing an `<xr/>` but this would add unwanted verbosity to the
-verbosity it would add to the encoding and the fact that it implicitly suggests
+encoding and implicitly suggest that the previous context was not the one of a
-that the previous context was not the one of a dictionary.
+dictionary which is rather problematic.
 ![La Grande Encyclopédie, tome 18, article "Gelocus" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/gelocus_t18.png){#fig:gelocus-photo}
@@ -739,7 +822,7 @@ the text, like the beginning of a new column of text or of a new page. Figure
 @fig:alcala-photo shows the top left of the last page of the first tome of *La
 Grande Encyclopédie* which features peritext elements while marking the
 beginning of a new page. The usual appropriate elements (`<pb/>` for page
-beginning, `<cb/>` for column beginning) may and should be used with our
+beginning, `<cb/>` for column beginning) may and should be used with this
 encoding scheme as demonstrated by Figure @fig:alcala-xml.
 ![La Grande Encyclopédie, tome 1, article "Alcala-de-Hénarès" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/last_page_top_left_t1.png){width=350px #fig:alcala-photo}
@@ -752,8 +835,8 @@ developed within the scope of project DISCO-LGE to automatically identify
 individual articles in the flow of raw text from the columns and to encode them
 into XML-TEI files. Though this software has already been used to produce the
 first TEI version of *La Grande Encyclopédie*, it does not follow perfectly yet
-the specification we have just described. Figure @fig:cathete-xml-current shows
+the specification described in this chapter. Figure @fig:cathete-xml-current
-the encoded version of article "Cathète" it currently produces:
+shows the encoded version of article "Cathète" it currently produces:
 ![The current encoding of article "Cathète" produced by `soprano`](snippets/cathète_current.png){#fig:cathete-xml-current}
@@ -797,7 +880,7 @@ information (for the second one, adjacent to a notion as elusive as truth)
 which requires a very deep understanding of a text in its entirety and about
 which even some human experts may disagree.
-For these reasons, a central concern in the design of our encoding scheme was to
+For these reasons, a central concern in the design of an encoding scheme was to
 remain within the boundaries of information that can be described objectively
 and extracted automatically by an algorithm. Most of the tags presented in
 section @sec:core-module contain information about the positions of the elements
@@ -806,30 +889,29 @@ like `<head/>` can be inferred simply from their position and the frequent use
 of a special typography like bold or upper-case characters.
 The case of cross-references is particular and may appear as a counter-example
-to the main principle on which our scheme is based. Actually, the process of
+to the main principle on which this scheme is based. Actually, the process of
 linking from an article to another one is so frequent (in dictionaries as well
 as in encyclopedias) that it generally escapes the scope of regular discourse to
 take a special and often fixed form, inside parenthesis and after a special
 token which invites the reader to perform the redirection. In *La Grande
-Encyclopédie*, virtually all the redirections (that is, to the extent of our
+Encyclopédie*, virtually all the redirections appear within parenthesis (at
-knowledge, absolutely all of them though of course some special case may exist,
+least no counter-example has been found within the scope of the project), and
-but they are statistically rare enough that we have not found any yet) appear
+start with the verb "voir" abbreviated as a single, capital "V." as illustrated
-within parenthesis, and start with the verb "voir" abbreviated as a single,
+in the article "Gelocus" (see again Figure @fig:gelocus-photo).
-capital "V." as illustrated in the article "Gelocus" (see again Figure
-@fig:gelocus-photo).
+Although this has not been implemented yet either, being able to detect and
+exploit those patterns to correctly encode cross-references does not pose any
-Although this has not been implemented yet either, we hope to be able to detect
+fundamental theoretical problem and should be achievable. Getting the `target`
-and exploit those patterns to correctly encode cross-references. Getting the
+attributes right is certainly more difficult to achieve and may require
-`target` attributes right is certainly more difficult to achieve and may require
 processing the articles in several steps, to first discover all the existing
 headwords — and hence article IDs — before trying to match the words following
-"V." with them. Since our automated encoder handles tomes separately and since
+"V." with them. Since the automated encoder implemented in the project handles
-references may cross the boundaries of tomes, it cannot wait for the target of a
+tomes separately and since references may cross the boundaries of tomes, it
-cross-reference to be discovered by keeping the articles in memory before
+cannot wait for the target of a cross-reference to be discovered by keeping the
-outputting them.
+articles in memory before outputting them.
-This is in line with the last important aspect of our encoder. If many
+This is in line with the last important aspect of the encoder. If many
-lexicographers may deem our encoding too shallow, it has the advantage of not
+lexicographers may deem this encoding too shallow, it has the advantage of not
 requiring to keep too complex datastructures in memory for a long time. The
 algorithm implementing it in `soprano` outputs elements as soon as it can. This
 is immediate for simple elements such as `<pb/>` or `<fw/>`; for articles, it
@@ -843,50 +925,55 @@ lowered to around forty minutes on a machine with 16Go of RAM for the whole of
 ## Comparison to other approaches
 The previous section about the structure of the *dictionaries* module and the
-features found in encyclopedias follows quite closely our own journey trying to
+features found in encyclopedias follows reflects the issues which have arised
-encode first manually then by automatic means the articles of our corpus. This
+along the course of the project while trying to encode first manually and then
-back and forth between trying to find patterns in the graph which reflects the patterns
+by automatic means the articles of its corpus. This back and forth between
-found in the text and questioning the relevance of the results explains the
+trying to find patterns in the graph which reflects the patterns found in the
-choice we ended up making but also the alternatives we have considered.
+text and questioning the relevance of the results explains the choice advocated
+in this chapter but also the alternatives considered.
-Several times, the issue of the semantics of some elements which posess the
-properties we need came up. This is the case for instance of the `<sense/>` and
+Several elements exhibited some interesting properties, having for instance some
-`<node/>` elements. It is very tempting to bend their documented semantics or to
+interesting inclusion path corresponding to the structure needed to represent
-consider that their inclusion properties is part of what defines them, and hence
+the nested structure of articles. This is the case for instance of the
-justifies their ways in creative ways not directly recommended by the TEI
+`<sense/>` and `<note/>` elements. It is very tempting to bend their documented
-specifications.
+semantics or to consider that their inclusion properties is part of what defines
+them, and hence justifies their ways in creative ways not directly recommended
-This is the approach followed by project BASNUM[^BASNUM]. In the articles
+by the TEI specifications.
-encoded for this project, `<note/>` elements are nested and used to structure
-the encyclopedic developments that occur in the articles.
+This is the approach followed by project BASNUM (see section
+@sec:starting-point). In the articles encoded for this project, `<note/>`
-We have chosen not to follow the same path in the name of the FAIR principles to
+elements are nested and used to structure the encyclopedic developments that
-avoid the emergence of a custom usage differing from the documented one.
+occur in the articles.
-The other major reason behind our choice was the inclusion rules which exist
+For the sake of the FAIR principles, this was not the path chosen by project
-between TEI elements and pushed us to look for different combinations. Another
+DISCO-LGE, in order to avoid the emergence of a custom usage differing from the
-valid approach would have consisted in changing the structure of the inclusion
+one documented in the official guidelines.
-graph itself, that is to say modify the rules. If `<entry/>` is the perfect
-element to encode article themselves, all that is really missing is the ability
+The other major reason behind the choice that was ultimately made was the
-to accomodate nested structures with the `<div/>` element. This would also have
+existing TEI rules governing element inclusions which prompted the search for
-the advantage of recovering the `<usg/>` and `<xr/>` elements which we have
+different combinations. Another valid approach would have consisted in changing
-recognised as useful and which we lose as part of the tradeoff to get nested
+the structure of the inclusion graph itself, that is to say modify the rules. If
-sections. Generating customised TEI schemas is made really easy with tools like
+`<entry/>` is the perfect element to encode article themselves, all that is
-ROMA ([https://roma.tei-c.org/](https://roma.tei-c.org/)), which we used to
+really missing is the ability to accomodate nested structures with the `<div/>`
-preview our change and suggest it to the TEI community.
+element. This would also have the advantage of recovering the `<usg/>` and
+`<xr/>` elements which appear useful and which are lost as part of the tradeoff
+to get nested sections. Generating customised TEI schemas is made really easy
+with tools like ROMA ([https://roma.tei-c.org/](https://roma.tei-c.org/)), which
+was used to preview this change and suggest it to the TEI community.
 Despite it not getting a wide adhesion, some suggested it could be used locally
-within the scope of project DISCO-LGE. However we chose not to do so, partially
+within the scope of project DISCO-LGE. However it was preferred not to do so,
-for the same reasons of interoperability as the previous scenario, but also for
+partially for the same reasons of interoperability as the previous scenario, but
-reasons of sturdiness in front of future evolutions. Making sure the alternative
+also for reasons of sturdiness in front of future evolutions. Making sure the
-schema would remain useful entails to maintain it, regenerating it should the
+alternative schema would remain useful entails to maintain it, regenerating it
-schema format evolve, with the risk that the tools to edit it might change or
+should the schema format evolve, with the risk that the tools to edit it might
-stop being maintained.
+stop being maintained or that some conflicts between this change and future
+modifications of the official guidelines might arise.
 # Conclusion
-Though they are very close genres and share a common history, we have evidenced
+Though they are very close genres and share a common history, key differences
-key aspects on which dictionaries and encyclopedias differ. Not only do entries
+between dictionaries and encyclopedias have been evidenced. Not only do entries
 tend to be longer in encyclopedias, they often have a deeper structure too.
 Their purpose also departs from the purpose of dictionaries from their
 inception, and, as anticipated by their pioneers, results in a different form of
@@ -894,15 +981,16 @@ discourse.
 The structure of the XML-TEI *dictionaries* module reflects the assumptions made
 by the eponymous genre and does not appear to be flexible enough to accomodate
-encyclopedias. Forcing its use to some encyclopedic articles would breach the
+encyclopedias, despite the colossal effort which has gone into making it
-semantics of some elements or require the encoder to break the rules of the
+expressive enough for the wide variety of existng dictionaries. Forcing its use
-consortium's schema which we think would result in a less reusable encoding in
+to some encyclopedic articles would breach the semantics of some elements or
-opposition to the FAIR principles.
+require the encoder to break the rules of the consortium's schema which would
+result in a less reusable encoding in opposition to the FAIR principles.
-We have devised and presented an encoding scheme which fully complies with
-XML-TEI while being able to represent the content of encyclopedias in all their
+An encoding scheme which fully complies with XML-TEI while being able to
-complexity. A first implementation of this encoding, incomplete as it may be,
+represent the content of encyclopedias in all their complexity has been provided
-demonstrates its practical usefulness.
+and demonstrated on concrete examples. The tool `soprano`, partially
+implementing this set of conventions demonstrates their practical usefulness.
 # Acknowledgement {-}