From c38691aed45cc3052dca85e4f94c6b571daa3a19 Mon Sep 17 00:00:00 2001 From: Alice BRENON <alice.brenon@ens-lyon.fr> Date: Tue, 23 Jul 2024 16:49:16 +0200 Subject: [PATCH] Fix typos *I* found in the text --- ICHLL_Brenon.md | 75 ++++++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 38 deletions(-) diff --git a/ICHLL_Brenon.md b/ICHLL_Brenon.md index f68bee4..7afe17c 100644 --- a/ICHLL_Brenon.md +++ b/ICHLL_Brenon.md @@ -206,7 +206,7 @@ projects of the 18^th^ century. In this version, the definition was entirely reworked, mildly stating that good encyclopedias are difficult to make because of the amount of knowledge necessary and work needed to keep up with scientific progress instead of calling the effort a parody. It credits -Chamber's *Cyclopædia* for being a decent attempt before referring anonymously +Chambers' *Cyclopædia* for being a decent attempt before referring anonymously though quite explicitly to Diderot and d'Alembert's project by naming the collective "Une Société de gens de Lettres" and writing that it started in 1751. Even more importantly, two new entries were added after it: one for the @@ -260,18 +260,17 @@ NENUFAR and BASNUM ([https://anr.fr/Projet-ANR-18-CE38-0003](https://anr.fr/Projet-ANR-18-CE38-0003)) to encode respectively the *Petit Larousse Illustré* published by Pierre -Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE* +Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE*, and the *Dictionnaire Universel* by Furetière, or rather its second edition edited by Henri Basnage de Beauval, an encyclopedic dictionary from the very early 18^th^ century [@williams2017, p. 1]. These successes suggested it to be a useful tool to encode encyclopedias but a few differences remained between both projects and DISCO-LGE: the text studied by NENUFAR does not have the encyclopedic dimension -*LGE* has and BASNUM studies a much older text which had a tremendous influence on the -european encyclopedic effort of the 18^th^ century but is not as clearly -separated from the dictionaric stem as *La Grande Encyclopédie* is. For these -reasons, the encoding schemes used in these projects could not be reused -directly, prompting for a systematic exploration of the XML-TEI schema to devise -a new one. +*LGE* has and BASNUM studies a much older text which had a tremendous influence +on the european encyclopedic effort of the 18^th^ century but is not as clearly +separated from the dictionaric stem as *LGE* is. For these reasons, the encoding +schemes used in these projects could not be reused directly, prompting for a +systematic exploration of the XML-TEI schema to devise a new one. This chapter discusses XML elements and hence needs to name and manipulate them. They will be represented in a monospace font, in the standard XML autoclosing @@ -315,14 +314,15 @@ The XML-TEI guidelines graph will hence be defined as follows. One node is created for each one of the 590 elements found in the specification. Then, an edge is placed between source node `A` and destination `B` if the schema states that the element represented by `B` can be contained directly under the element -represented by `B`. That is, the edges in the graph represent the relation "is -an admissible direct parent of". Please note that the word "element" is here -used with the same meaning as in the TEI documentation to refer to the -conceptual device characterised by a given tag name such as `p` or `div` and not -to a particular instance of them that may occur in a given document. Figure -@fig:dictionaries-subgraph, by using this transformation to display only the -*dictionaries* module, hints at the overall complexity of the whole -specification. +represented by `A`. That is, the edges in the graph represent the relation "is +an admissible direct parent of" (written infix, as in "A is connected to B" if +and only if "A is an admissible direct parent of B"). Please note that the word +"element" is here used with the same meaning as in the TEI documentation to +refer to the conceptual device characterised by a given tag name such as `p` or +`div` and not to a particular instance of them that may occur in a given +document. Figure @fig:dictionaries-subgraph, by using this transformation to +display only the *dictionaries* module, hints at the overall complexity of the +whole specification. {height=830px #fig:dictionaries-subgraph} @@ -362,13 +362,13 @@ Using inclusion paths lets one find for instance that although `<pos/>` may not be directly included within `<entry/>` elements to include information about the part-of-speech of the word that an article defines, the correct way to do so is through a `<form/>` or a `<gramGrp/>` because a thorough traversal reporting all -the possible paths will contain `entry-form-pos` and `entry-grapmGrp-pos`. It is +the possible paths will contain `entry-form-pos` and `entry-gramGrp-pos`. It is left to the human encoder to rate the relevance of the path found and to select an appropriate one. A total lack of path proves the impossibility of an inclusion; an abnormally high length for the shortest path is a serious hint that the inclusion should not be possible and is not meaningful. -Another relevant example on the use of these methods can be given by querying +Another relevant example of the use of these methods can be given by querying the shortest inclusion path of a `<pos/>` under the `<body/>` of the document: it yields an inclusion directly through `<entryFree/>` (with an inclusion path of length 2), which unlike `<entry/>` accepts it as a direct child node. @@ -387,7 +387,7 @@ associated to its definition. It is the natural way in from the `<body/>` element to the *dictionaries* module: indeed, although `<body/>` may also contain `<entryFree/>` or `<superEntry/>` elements, the former is a relaxed version of `<entry/>` while the latter is a device to group several related -entries together. Both can contain an `<entry/` directly while no obvious +entries together. Both can contain an `<entry/>` directly while no obvious inclusion exists the other way around: most (> 96.2%) of the inclusion paths of "reasonable" depth (which will be arbitrarily defined as strictly inferior to 5, that is twice the average shortest depth between any two nodes) either include @@ -396,7 +396,7 @@ to appear in an article in general, showing that the purpose of `<entry/>` is not to contain an `<entryFree/>` or `<superEntry/>`. Hence, not only the semantics conveyed by the documentation but also the structure of the elements graph evidence `<entry/>` as the natural top-most element for an article. This -example demonstrate again how a graph-centred approach can provide insights +example demonstrates again how a graph-centred approach can provide insights about the XML-TEI schema. Once a block for an article is created, it may contain elements useful to @@ -467,13 +467,13 @@ which belongs for example the `<ref/>` element), the *dictionaries* module appears somewhat isolated from important structural elements like `<head/>` or `<div/>`. Indeed, computing all the paths from either `<entry/>` or `<sense/>` elements to the latter of length shorter or equal to 5 by a systematic traversal -of the graph yields exclusively paths (respectively 9042 and 39093 of them) -containing either a `<floatingText/>` or an `<app/>` element. The first one, as -its name aptly suggests, is used to encode text that does not quite fit the -regular flow of the document, as for example in the context of an embedded -narrative. Both examples displayed in the online documentation feature a -`<body/>` as direct child of `<floatingText/>`, neatly separating its content as -independent. The purpose of the second one, although its name — short for +of the graph yields exclusively paths (respectively 8 943 and 38 649 of them +excluding loops) containing either a `<floatingText/>` or an `<app/>` element. +The first one, as its name aptly suggests, is used to encode text that does not +quite fit the regular flow of the document, as for example in the context of an +embedded narrative. Both examples displayed in the online documentation feature +a `<body/>` as direct child of `<floatingText/>`, neatly separating its content +as independent. The purpose of the second one, although its name — short for apparatus — is less clear, is to wrap together several versions of the same excerpts, for instance when there are several possible readings of an unclear group of words in a manuscript, or when the encoder is trying to compile a @@ -487,21 +487,20 @@ structures like `<div/>`. # A new standard ? {#sec:new-standard} -Studying the content of *LGE* and considering several -articles in particular, one can identify structures which are specific to -encyclopedias and not compatible with the *dictionaries* module presented in the -previous section. It follows that this module is not able to encode arbitrary -encyclopedic content and propose a new fully TEI-compliant encoding scheme -remaining outside of it. The rest of the section is concerned with the needs of -automated encoding processes and compares the proposal with other strategies to -overcome the issues previously identified with the dedicated module for -dictionaries. +Studying the content of *LGE* and considering several articles in particular, +one can identify structures which are specific to encyclopedias and not +compatible with the *dictionaries* module presented in the previous section. It +follows that this module is not able to encode arbitrary encyclopedic content +and hence a new fully TEI-compliant encoding scheme is proposed. The rest of the +section is concerned with the needs of automated encoding processes and compares +the proposal with other strategies to overcome the issues previously identified +with the dedicated module for dictionaries. ## Idiosynchrasies of encyclopedias Browsing through the pages of an encyclopedia reveals a certain number of noticeable differences. A comprehensive list would be difficult to draw because -of the great variety in terms of editorial choices the most obvious can be +of the great variety in terms of editorial choices but the most obvious can be discussed. The first immediately visible feature that sets encyclopedias apart from @@ -560,7 +559,7 @@ describing their relation to events and other persons comes out even further from the notion of meaning. Entries such as the one about SANJO Sanetomi (see Figure @fig:sanjo) do not constitute a *definition*. -)](ressources/sanjo_t29.png){#fig:sanjo} +)](ressources/sanjo_t29.png){#fig:sanjo} Moreover, encyclopedias, because of all that they have inherited from the philosophical Enlightenment, are not only spaces designed to assert, they also -- GitLab