Rework the cross-section references to get rid of 'the above remarks'

d0c9e718 · Alice Brenon · ae33fd14 · d0c9e718
Commit d0c9e718 authored 2 years ago by Alice Brenon
--- a/ICHLL_Brenon.md
+++ b/ICHLL_Brenon.md
@@ -189,7 +189,7 @@ d'Alembert insists on the importance of brevity for a clear definition in the
 consider encyclopedias superior to dictionaries but really as a new subgenre
 departing from them in terms of purpose.
-# The *dictionaries* TEI module
+# The *dictionaries* TEI module {#sec:dictionaries-module}
 The XML-TEI standard has a modular structure consisting of optional parts each
 covering specific needs such as the physical features of a source document, the
@@ -406,13 +406,13 @@ element made to group quotations with a bibliographic reference to their source
 which should clearly be unnecessary to encode an article in the general case.
 Secondly, although we have seen examples of connections from this module to the
-rest of the XML-TEI, especially to the *core* module (see the case of the
+rest of the XML-TEI, especially to the *core* module (to which belongs for
-`<ref/>` element above), the *dictionaries* module appears somewhat isolated
+example the `<ref/>` element), the *dictionaries* module appears somewhat
-from important structural elements like `<head/>` or `<div/>`. Indeed, computing
+isolated from important structural elements like `<head/>` or `<div/>`. Indeed,
-all the paths from either `<entry/>` or `<sense/>` elements to the latter of
+computing all the paths from either `<entry/>` or `<sense/>` elements to the
-length shorter or equal to 5 by a systematic traversal of the graph yields
+latter of length shorter or equal to 5 by a systematic traversal of the graph
-exclusively paths (respectively 9042 and 39093 of them) containing either a
+yields exclusively paths (respectively 9042 and 39093 of them) containing either
-`<floatingText/>` or an `<app/>` element. The first one, as its name aptly
+a `<floatingText/>` or an `<app/>` element. The first one, as its name aptly
 suggests, is used to encode text that does not quite fit the regular flow of the
 document, as for example in the context of an embedded narrative. Both examples
 displayed in the online documentation feature a `<body/>` as direct child of
@@ -433,12 +433,13 @@ structures like `<div/>`.
 Studying the content of *La Grande Encyclopédie* and considering several
 articles in particular, we identify structures which are specific to
-encyclopedias and not compatible with the *dictionaries* module presented above.
+encyclopedias and not compatible with the *dictionaries* module presented in the
-We hence conclude that this module is not able to encode arbitrary encyclopedic
+previous section. We hence conclude that this module is not able to encode
-content and propose a new fully TEI-compliant encoding scheme remaining outside
+arbitrary encyclopedic content and propose a new fully TEI-compliant encoding
-of it. We proceed with remarks about the needs of automated encoding processes
+scheme remaining outside of it. We proceed with remarks about the needs of
-and compare our proposal with other strategies to overcome the issues previously
+automated encoding processes and compare our proposal with other strategies to
-identified with the dedicated module for dictionaries.
+overcome the issues previously identified with the dedicated module for
+dictionaries.
 ## Idiosynchrasies of encyclopedias
@@ -455,7 +456,7 @@ system. Those generally cover a broad range of subjects from scientific
 disciplines to litterature, and extending to political subjects and law.
 No element in the *dictionaries* module is explicitely designed for the purpose
-of encoding these indicators. As we have seen above, the elements set is geared
+of encoding these indicators. As we have seen, the elements set is geared
 towards the words themselves instead of the concept they represent. The closest
 tool for what we need is found in the `<usg/>` element used with a specific
 `type` attribute set to `dom` for "domain". Indeed several examples from the
@@ -553,8 +554,6 @@ and `<title/>` — the latter with the possibility to set its `type` attribute t
 `sub` — stand out as the best candidates for the semantics condition on the
 second element.
-#### Candidates in the *dictionaries* module {-}
 Filtering the content of the module to keep only the elements which can at the
 same time contain themselves, be included under `<entry/>` and include a `<p/>`
 and either the `<head/>` or `<title/>` elements yields absolutely no candidates.
@@ -590,8 +589,6 @@ elements could fulfill our purpose, it is a fact than no element in this module
 appears as an obvious good solution and a serious hint to keep looking somewhere
 else.
-#### Widening the search {-}
 We hence widen our search to include elements outside the *dictionaries* module
 which could be used to encode our sections and subsections, under the same
 constraint as before to try and find a composite solution that would remain
@@ -617,25 +614,26 @@ comment" which appears "out of the main textual stream" whereas the long
 developments in articles are the very matter of the text of encyclopedias, not
 mere remarks in the margins or at the foot of pages.
-## Encoding within the *core* module
+## Encoding within the *core* module {#sec:core-module}
-The above remarks explain why the *dictionary* module is unable to represent
+The remarks made in section @sec:dictionaries-module explain why the
-encyclopedias, where the notion of "meaning" is less central that in
+*dictionary* module is unable to represent encyclopedias, where the notion of
-dictionaries and where discourse with nested structures of arbitrary depth can
+"meaning" is less central that in dictionaries and where discourse with nested
-occur. Even composite encodings using elements outside of the *dictionaries*
+structures of arbitrary depth can occur. Even composite encodings using elements
-module under an `<entry/>` element do not meet our requirements. Since the
+outside of the *dictionaries* module under an `<entry/>` element do not meet our
-*core* module of course accomodates these structures by means of the `<div/>`,
+requirements. Since the *core* module obviously accomodates these structures by
-`<head/>` and `<p/>` elements which have the additional advantage of carrying
+means of the `<div/>`, `<head/>` and `<p/>` elements which have the additional
-less semantical payload than `<sense/>` or `<def/>` we devise an encoding scheme
+advantage of carrying less semantical payload than `<sense/>` or `<def/>` we
-using them which we recommend using for other projects aiming at representing
+devise an encoding scheme using them which we recommend using for other projects
-encyclopedias.
+aiming at representing encyclopedias.
-To remain consistent with the above remarks we will only concern ourselves with
+To remain consistent with the way we studied the *dictionaries* module we will
-what happens at the level of each article, right under the `<body/>` element.
+only concern ourselves with what happens at the level of each article, right
-Everything related to metadata happens as expected in the file's `<teiHeader/>`
+under the `<body/>` element.  Everything related to metadata happens as expected
-which is well-enough equiped to handle them. In order to present our scheme
+in the file's `<teiHeader/>` which is well-enough equiped to handle them. In
-throughout the following section we will be progressively encoding a reference
+order to present our scheme throughout the following section we will be
-article, "Cathète" from tome 9 reproduced in Figure @fig:cathete-photo.
+progressively encoding a reference article, "Cathète" from tome 9 reproduced in
+Figure @fig:cathete-photo.
 ![La Grande Encyclopédie, tome 9, article "Cathète" ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/cathète_t9.png){#fig:cathete-photo}
@@ -748,15 +746,14 @@ encoding scheme as demonstrated by Figure @fig:alcala-xml.
 ![Encoding the beginning of a page in article "Alcala-de-Hénarès"](snippets/alcala.png){#fig:alcala-xml}
-The reference implementation for this encoding scheme is the program
+The reference implementation for this encoding scheme is the program soprano
-soprano
+([https://gitlab.huma-num.fr/disco-lge/soprano](https://gitlab.huma-num.fr/disco-lge/soprano))
-([https://gitlab.huma-num.fr/disco-lge/soprano](https://gitlab.huma-num.fr/disco-lge/soprano)) developed within the scope of project DISCO-LGE to
+developed within the scope of project DISCO-LGE to automatically identify
-automatically identify individual articles in the flow of raw text from the
+individual articles in the flow of raw text from the columns and to encode them
-columns and to encode them into XML-TEI files. Though this software has already
+into XML-TEI files. Though this software has already been used to produce the
-been used to produce the first TEI version of *La Grande Encyclopédie*, it does
+first TEI version of *La Grande Encyclopédie*, it does not follow perfectly yet
-not yet follow the above specification perfectly. Figure
+the specification we have just described. Figure @fig:cathete-xml-current shows
-@fig:cathete-xml-current shows the encoded version of article "Cathète" it
+the encoded version of article "Cathète" it currently produces:
-currently produces:
 ![The current encoding of article "Cathète" produced by `soprano`](snippets/cathète_current.png){#fig:cathete-xml-current}
@@ -802,11 +799,11 @@ which even some human experts may disagree.
 For these reasons, a central concern in the design of our encoding scheme was to
 remain within the boundaries of information that can be described objectively
-and extracted automatically by an algorithm. Most of the tags presented above
+and extracted automatically by an algorithm. Most of the tags presented in
-contain information about the positions of the elements or their relation to one
+section @sec:core-module contain information about the positions of the elements
-another. Those with an additional semantics implication like `<head/>` can be
+or their relation to one another. Those with an additional semantics implication
-inferred simply from their position and the frequent use of a special typography
+like `<head/>` can be inferred simply from their position and the frequent use
-like bold or upper-case characters.
+of a special typography like bold or upper-case characters.
 The case of cross-references is particular and may appear as a counter-example
 to the main principle on which our scheme is based. Actually, the process of
@@ -818,7 +815,7 @@ Encyclopédie*, virtually all the redirections (that is, to the extent of our
 knowledge, absolutely all of them though of course some special case may exist,
 but they are statistically rare enough that we have not found any yet) appear
 within parenthesis, and start with the verb "voir" abbreviated as a single,
-capital "V." as illustrated above in the article "Gelocus" (see again Figure
+capital "V." as illustrated in the article "Gelocus" (see again Figure
 @fig:gelocus-photo).
 Although this has not been implemented yet either, we hope to be able to detect
@@ -834,10 +831,10 @@ outputting them.
 This is in line with the last important aspect of our encoder. If many
 lexicographers may deem our encoding too shallow, it has the advantage of not
 requiring to keep too complex datastructures in memory for a long time. The
-algorithm implementing it in `soprano` outputs elements as soon as it can, for
+algorithm implementing it in `soprano` outputs elements as soon as it can. This
-instance the empty elements already discussed above. For articles, it pushes
+is immediate for simple elements such as `<pb/>` or `<fw/>`; for articles, it
-lines onto a stack and flushes it each time it encounters the beginning of the
+pushes lines onto a stack and flushes it each time it encounters the beginning
-following article. This allows the amount of memory required to remain
+of the following article. This allows the amount of memory required to remain
 reasonable and even lets them be parallelised on most modern machines. Thus,
 even taking over three minutes per tome, the total processing time can be
 lowered to around forty minutes on a machine with 16Go of RAM for the whole of
@@ -886,7 +883,7 @@ schema would remain useful entails to maintain it, regenerating it should the
 schema format evolve, with the risk that the tools to edit it might change or
 stop being maintained.
-# Conclusion {-}
+# Conclusion
 Though they are very close genres and share a common history, we have evidenced
 key aspects on which dictionaries and encyclopedias differ. Not only do entries