Fix some more typos and set a text width on articles pictures

6e8e6305 · Alice Brenon · c38691ae · 6e8e6305
Commit 6e8e6305 authored 11 months ago by Alice Brenon
--- a/ICHLL_Brenon.md
+++ b/ICHLL_Brenon.md
@@ -190,7 +190,7 @@ lifetime may be achieved by a common effort throughout generations.
 History hints that Diderot's opponents took his defence of the feasability of
 the project quite seriously, considering the fact that they got the *EDdA*'s
-privileges to be revoked again six years after its publication was resumed
+privileges revoked again six years after its publication was resumed
 [@moureau2001]. As a consequence, the remaining ten volumes containing the text
 of the articles had to be published illegally until 1765, thanks to the secret
 protection of Malesherbes who — despite being head of royal censorship — saved
@@ -246,7 +246,7 @@ to future scientific projects, which in particular requires it to be
 *interoperable* and *reusable*. These are the two last key aspects of the FAIR
 ([https://www.go-fair.org/fair-principles/](https://www.go-fair.org/fair-principles/))
 principles (*findability*, *accessibility*, *interoperability* and
-*reusability*) which are important guideline for efficient, high-quality
+*reusability*) which are important guidelines for efficient, high-quality
 research. This section starts by describing the existing toolset provided by the
 XML-TEI guidelines to achieve this goal, before introducing some notations and
 tools from graph theory which will be used to browse the guidelines in a
@@ -261,7 +261,7 @@ and BASNUM
 ([https://anr.fr/Projet-ANR-18-CE38-0003](https://anr.fr/Projet-ANR-18-CE38-0003))
 to encode respectively the *Petit Larousse Illustré* published by Pierre
 Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE*, and the
-*Dictionnaire Universel* by Furetière, or rather its second edition edited by
+*Dictionnaire Universel* by Furetière, or rather its second version edited by
 Henri Basnage de Beauval, an encyclopedic dictionary from the very early 18^th^
 century [@williams2017, p. 1]. These successes suggested it to be a useful tool
 to encode encyclopedias but a few differences remained between both projects and
@@ -313,7 +313,7 @@ TEI framework could build.
 The XML-TEI guidelines graph will hence be defined as follows. One node is
 created for each one of the 590 elements found in the specification. Then, an
 edge is placed between source node `A` and destination `B` if the schema states
-that the element represented by `B` can be contained directly under the element
+that the element represented by `B` can be contained directly by the element
 represented by `A`. That is, the edges in the graph represent the relation "is
 an admissible direct parent of" (written infix, as in "A is connected to B" if
 and only if "A is an admissible direct parent of B"). Please note that the word
@@ -347,8 +347,8 @@ length of an inclusion path will be called its *depth*.
 The ability for an element to contain itself corresponds directly to loops on
 the graph (that is an edge from a node to itself) as can be illustrated by the
-`<abbr/>` element: an `<abbr/>` element (abbreviation) can directly contain
+`<entry/>` element on figure \ref{fig:dictionaries-subgraph}: an `<entry/>`
-another one.
+element (abbreviation) can directly contain another one.
 The generalisation of this to inclusion paths of any length greater than one is
 usually called a cycle and it appears natural to refine this and name them
@@ -365,7 +365,7 @@ through a `<form/>` or a `<gramGrp/>` because a thorough traversal reporting all
 the possible paths will contain `entry-form-pos` and `entry-gramGrp-pos`. It is
 left to the human encoder to rate the relevance of the path found and to select
 an appropriate one. A total lack of path proves the impossibility of an
-inclusion; an abnormally high length for the shortest path is a serious hint
+inclusion; an abnormally high depth for the shortest path is a serious hint
 that the inclusion should not be possible and is not meaningful.
 Another relevant example of the use of these methods can be given by querying
@@ -465,8 +465,8 @@ Secondly, although examples of connections from this module to the rest of the
 XML-TEI have been evidenced in this section, especially to the *core* module (to
 which belongs for example the `<ref/>` element), the *dictionaries* module
 appears somewhat isolated from important structural elements like `<head/>` or
-`<div/>`. Indeed, computing all the paths from either `<entry/>` or `<sense/>`
+`<div/>`. Indeed, computing all the paths of length shorter or equal to 5 from
-elements to the latter of length shorter or equal to 5 by a systematic traversal
+either `<entry/>` or `<sense/>` elements to the latter by a systematic traversal
 of the graph yields exclusively paths (respectively 8 943 and 38 649 of them
 excluding loops) containing either a `<floatingText/>` or an `<app/>` element.
 The first one, as its name aptly suggests, is used to encode text that does not
@@ -530,7 +530,7 @@ knowledge, and the occurrence at the beginning of articles, more than a tool to
 clear up possible ambiguities also points the reader to the correct place in
 this mind map.
-!["Systême figuré des connoissances humaines", the taxonomy at the heart of the Encyclopédie ([Wikimedia Commons](https://commons.wikimedia.org/wiki/File:ENC_SYSTEME_FIGURE.jpeg?uselang=fr#filelinks))](ressources/arbre.png){width=300px #fig:systeme-figure}
+!["Systême figuré des connoissances humaines", the taxonomy at the heart of the Encyclopédie ([Wikimedia Commons](https://commons.wikimedia.org/wiki/File:ENC_SYSTEME_FIGURE.jpeg?uselang=fr#filelinks))](ressources/arbre.png){#fig:systeme-figure}
 The situation regarding subject indicators is hardly better outside of the
 module. The `<domain/>` element despite its name belongs exclusively in the
@@ -559,7 +559,7 @@ describing their relation to events and other persons comes out even further
 from the notion of meaning. Entries such as the one about SANJO Sanetomi (see
 Figure @fig:sanjo) do not constitute a *definition*.
-![Beginning of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/sanjo_t29.png){#fig:sanjo}
+![Beginning of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/sanjo_t29.png){#fig:sanjo width=65%}
 Moreover, encyclopedias, because of all that they have inherited from the
 philosophical Enlightenment, are not only spaces designed to assert, they also
@@ -568,7 +568,7 @@ basis required to understand the complexity of an issue and invite the reader to
 consider it without providing a definitive answer, going as far as to explicitly
 use question marks as in the article "Action" displayed in Figure @fig:action.
-![Excerpt from article "Action", in La Grande Encyclopédie, tome 1 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/action_t1.png){#fig:action}
+![Excerpt from article "Action", in La Grande Encyclopédie, tome 1 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/action_t1.png){#fig:action width=65%}
 In this extract, the author devises a hypothetical situation to illustrate how
 difficult it is to draw the line between two supposedly mutually exclusive
@@ -579,9 +579,9 @@ idea that the term eludes definition, wrapping it in a `<sense/>`, or worse, a
 As a result, the use of `<sense/>` and `<def/>` is not appropriate for
 encyclopedic content in general.
-The final difficulty can be considered as a partial consequence of the previous
+The final difficulty can be considered a partial consequence of the previous one
-one on the structure of articles. The difficulty to define complex concepts is
+on the structure of articles. The difficulty to define complex concepts is the
-the very reason why authors approach their subjects from various angles,
+very reason why authors approach their subjects from various angles,
 circumnavigating it as a best approximation. This strategy favours long,
 structured developments with sections and subsections covering the multiple
 aspects of the topic: from a historical, political, scientific point of view…
@@ -613,23 +613,22 @@ Filtering the content of the module to keep only the elements which can at the
 same time contain themselves, be included under `<entry/>` and include a `<p/>`
 and either the `<head/>` or `<title/>` elements yields absolutely no candidates.
 It is remarkable that even replacing the `<entry/>` element for the root of each
-article with an `<entryFree/>`, an element supposed to relax some constraint to
+article with an `<entryFree/>`, an element supposed to relax the constraints to
-accomodate more unusual structure in dictionaries does not bring any
+accomodate more unusual structures in dictionaries does not bring any
 improvement.
-The lack of results from these simple queries forces one to somewhat release the
+The lack of results from these simple queries forces one to adopt a less
-constraints on the encoding one is willing to use. The occurrence of an
+restrictive approach to find an encoding. The occurrence of an intermediate
-intermediate element could for instance be needed between the element wrapping
+element could for instance be needed between the element wrapping the whole
-the whole article and the recursing one used to encode each section. This
+article and the recursing one used to encode each section. This "section"
-"section" element could also need a companion element to be able to include
+element could also need a companion element to be able to include itself, or, to
-itself, or, to formalise it in terms of graph theory, the condition that this
+formalise it in terms of graph theory, the condition that this element admits a
-element admits a loop could be relaxed to consider instead cycles of a given
+loop could be relaxed to consider instead cycles of a given (small, this still
-(small, this still needs to represent a fairly direct inclusion) length to be
+needs to represent a fairly direct inclusion) length to be enough.
-enough. Simultaneously the maximum depth of the inclusion paths between
+Simultaneously the maximum depth of the inclusion paths between `<entry/>`, the
-`<entry/>`, the pair of elements and the `<p/>` element will be increased to
+pair of elements and the `<p/>` element will be increased to yield more results.
-yield more results.
+By setting this depth to 2, that is, by accepting one intermediate element to
-By setting this depth to 3, that is, by accepting one intermediate element to
 occur in the middle of each one of the inclusion paths that define the structure
 required to encode encyclopedic discourse, 21 elements can be found, none of
 which stands out as an obvious good solution: all paths to include the `<p/>`
@@ -641,9 +640,9 @@ works) or a `<state/>` (used to describe a temporary quality in a person or
 place), again not even close to what is wanted. The paths to either `<head/>` or
 `<title/>` are similarly disappointing. Again, changing `<entry/>` for
 `<entryFree/>` returns the exact same candidates. If that is not a definite
-proof that none of these elements could the investigated criteria, it is a fact
+proof that none of these elements could meet the investigated criteria, it is a
-than no element in this module stands out as the obvious good solution and a
+fact than no element in this module stands out as the obvious good solution and
-serious hint to keep looking somewhere else.
+a serious hint to keep looking somewhere else.
 Therefore, the search is extended again to include elements outside the
 *dictionaries* module which could be used to encode the sections and
@@ -689,7 +688,7 @@ right under the `<body/>` element representing a whole volume. Everything
 related to its metadata happens as expected in the file's `<teiHeader/>` which
 is well-enough equiped to handle them. In order to present the scheme throughout
 the following section a reference article, "Cathète" from tome 9 — reproduced in
-Figure @fig:cathete-photo — will be progressively encoding.
+Figure @fig:cathete-photo — will be encoded step by step.
 ![La Grande Encyclopédie, tome 9, article "Cathète" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/cathète_t9.png){#fig:cathete-photo}
@@ -715,12 +714,13 @@ sharing the same head. Thus, if an oversegmentation or a subsegmentation are
 fixed (meaning respectively that two "articles" get fusioned or that one
 "article" actually contained several which get split as such) only articles with
 the same headword are impacted. Figure @fig:cathete-xml-0 illustrates this
-choice for the container element on the article "Cathète" previously displayed.
+choice for the container element on the article "Cathète" displayed on figure
+\ref{fig:cathete-photo}.
 ![The container `div` element for article "Cathète"](snippets/cathète_0.png){#fig:cathete-xml-0}
 Inside this element should be a `<head/>` enclosing the headword of the article.
-The usual sub-`<hi/>` elements are available within `<head/>` if the headword is
+The usual `<hi/>` elements are available within `<head/>` if the headword is
 highlighted by any special typographic means such as bold, small capitals, etc.
 The one disappointment of the encoding scheme being defined in this chapter is
 the lack of support for a proper way to encode subject indicators.
@@ -746,9 +746,9 @@ choice applied to the same article "Cathète" produces Figure @fig:cathete-xml-1
 Each different meaning could then be wrapped in a separate `<div/>` with the
 `type` attribute set to `sense` to refer to the `<sense/>` element that would
-have been used within the *core* module. The `<div/>`s should be numbered
+have been used within the *dictionaries* module. The `<div/>`s should be
-according to the order they appear in with the `n` attribute starting from `0`
+numbered according to the order they appear in with the `n` attribute starting
-as shown in Figure @fig:cathete-xml-2.
+from `0` as shown in Figure @fig:cathete-xml-2.
 ![The empty structure for the only meaning of the word "Cathète"](snippets/cathète_2.png){#fig:cathete-xml-2}
@@ -761,7 +761,7 @@ information to reconstruct a faithful facsimile but it also has the advantage of
 highlighting the fact than even though the definition is cut from the headword
 by being in a separate XML element, they still occur on the same line, which is
 a typographic choice usually made both in encyclopedias and dictionaries where
-space is at a premium. .
+space is at a premium.
 To complete the structure, the various sections and subsections occurring
 within the article body may be nested as usual with `<div/>` and sub-`<div/>`s,
@@ -790,7 +790,7 @@ sole purpose of placing an `<xr/>` but this would add unwanted verbosity to the
 encoding and implicitly suggest that the previous context was not the one of a
 dictionary which is rather problematic.
-![La Grande Encyclopédie, tome 18, article "Gelocus" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/gelocus_t18.png){#fig:gelocus-photo}
+![La Grande Encyclopédie, tome 18, article "Gelocus" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/gelocus_t18.png){#fig:gelocus-photo width=65%}
 ![Encoding the cross-references in article "Gelocus"](snippets/gelocus.png){#fig:gelocus-xml}
@@ -836,27 +836,24 @@ entry's `<div/>` element instead of under a set of nested `<div/>` elements. The
 paragraphs are not yet identified and for this reason not encoded.
 However, the figures and their captions are already handled correctly when they
-occur. The encoder also keeps track of the current lines, pages, and columns and
+occur. The encoder also keeps track of the current lines, pages, and columns to
-inserts the corresponding empty elements (`<lb/>`, `<pb/>` or `<cb/>`) and
+insert the corresponding empty elements (`<lb/>`, `<pb/>` or `<cb/>`) and number
-numbers pages so that the numbering corresponding to the physical pages are
+pages according to the order of the physical pages in the book, as compared to
-available, as compared to the "high-level" pages numbers inserted by the
+the "high-level" pages numbers inserted by the editors, which start with an
-editors, which start with an offset because the first, blank or almost empty
+offset because the first, blank or almost empty pages at the beginning of each
-pages at the beginning of each book do not have a number and which sometimes have
+book do not have a number and which sometimes have gaps when a full-page
-gaps when a full-page geographical map is inserted since those are printed
+geographical map is inserted since those are printed separately on a different
-separately on a different folio which remains outside of the textual numbering
+folio which remains outside of the textual numbering system. The place at which
-system. The place at which these layout-related elements occur is determined by
+these layout-related elements occur is determined by the place where the OCR
-the place where the OCR software detected them and by the reordering performed
+software detected them and by the reordering performed by `soprano` when
-by `soprano` when inferring the reading order before segmenting the articles.
+inferring the reading order before segmenting the articles.
 ## The constraints of automated processing
 Encyclopedias are particularly long books, spanning numerous tomes and
-containing several tenths of thousands of articles. The *EDdA* comprises
+containing several tenths of thousands of articles. The *EDdA* comprises over
-over 74k articles and *LGE* certainly more than 100k (the latest
+74k articles and *LGE* certainly more than 100k (the latest version produced by
-version produced by `soprano` created 160k articles, but their segmentation is
+`soprano` created 160k articles, but their segmentation is still not perfect).
-still not perfect and if some article beginning remain undetected, all the very
-long and deeply-structured articles are unduly split into many parts, resulting
-globally in an overestimation of the total number).
 XML-TEI is a very broad tool useful for very different applications. Some
 elements like `<unclear/>` or `<factuality/>` can encode subtle semantics