Skip to content
Snippets Groups Projects
Commit c38691ae authored by Alice Brenon's avatar Alice Brenon
Browse files

Fix typos *I* found in the text

parent eed116b1
No related branches found
No related tags found
No related merge requests found
...@@ -206,7 +206,7 @@ projects of the 18^th^ century. In this version, the definition ...@@ -206,7 +206,7 @@ projects of the 18^th^ century. In this version, the definition
was entirely reworked, mildly stating that good encyclopedias are difficult to was entirely reworked, mildly stating that good encyclopedias are difficult to
make because of the amount of knowledge necessary and work needed to keep up make because of the amount of knowledge necessary and work needed to keep up
with scientific progress instead of calling the effort a parody. It credits with scientific progress instead of calling the effort a parody. It credits
Chamber's *Cyclopædia* for being a decent attempt before referring anonymously Chambers' *Cyclopædia* for being a decent attempt before referring anonymously
though quite explicitly to Diderot and d'Alembert's project by naming the though quite explicitly to Diderot and d'Alembert's project by naming the
collective "Une Société de gens de Lettres" and writing that it started in 1751. collective "Une Société de gens de Lettres" and writing that it started in 1751.
Even more importantly, two new entries were added after it: one for the Even more importantly, two new entries were added after it: one for the
...@@ -260,18 +260,17 @@ NENUFAR ...@@ -260,18 +260,17 @@ NENUFAR
and BASNUM and BASNUM
([https://anr.fr/Projet-ANR-18-CE38-0003](https://anr.fr/Projet-ANR-18-CE38-0003)) ([https://anr.fr/Projet-ANR-18-CE38-0003](https://anr.fr/Projet-ANR-18-CE38-0003))
to encode respectively the *Petit Larousse Illustré* published by Pierre to encode respectively the *Petit Larousse Illustré* published by Pierre
Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE* Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE*, and the
*Dictionnaire Universel* by Furetière, or rather its second edition edited by *Dictionnaire Universel* by Furetière, or rather its second edition edited by
Henri Basnage de Beauval, an encyclopedic dictionary from the very early 18^th^ Henri Basnage de Beauval, an encyclopedic dictionary from the very early 18^th^
century [@williams2017, p. 1]. These successes suggested it to be a useful tool century [@williams2017, p. 1]. These successes suggested it to be a useful tool
to encode encyclopedias but a few differences remained between both projects and to encode encyclopedias but a few differences remained between both projects and
DISCO-LGE: the text studied by NENUFAR does not have the encyclopedic dimension DISCO-LGE: the text studied by NENUFAR does not have the encyclopedic dimension
*LGE* has and BASNUM studies a much older text which had a tremendous influence on the *LGE* has and BASNUM studies a much older text which had a tremendous influence
european encyclopedic effort of the 18^th^ century but is not as clearly on the european encyclopedic effort of the 18^th^ century but is not as clearly
separated from the dictionaric stem as *La Grande Encyclopédie* is. For these separated from the dictionaric stem as *LGE* is. For these reasons, the encoding
reasons, the encoding schemes used in these projects could not be reused schemes used in these projects could not be reused directly, prompting for a
directly, prompting for a systematic exploration of the XML-TEI schema to devise systematic exploration of the XML-TEI schema to devise a new one.
a new one.
This chapter discusses XML elements and hence needs to name and manipulate them. This chapter discusses XML elements and hence needs to name and manipulate them.
They will be represented in a monospace font, in the standard XML autoclosing They will be represented in a monospace font, in the standard XML autoclosing
...@@ -315,14 +314,15 @@ The XML-TEI guidelines graph will hence be defined as follows. One node is ...@@ -315,14 +314,15 @@ The XML-TEI guidelines graph will hence be defined as follows. One node is
created for each one of the 590 elements found in the specification. Then, an created for each one of the 590 elements found in the specification. Then, an
edge is placed between source node `A` and destination `B` if the schema states edge is placed between source node `A` and destination `B` if the schema states
that the element represented by `B` can be contained directly under the element that the element represented by `B` can be contained directly under the element
represented by `B`. That is, the edges in the graph represent the relation "is represented by `A`. That is, the edges in the graph represent the relation "is
an admissible direct parent of". Please note that the word "element" is here an admissible direct parent of" (written infix, as in "A is connected to B" if
used with the same meaning as in the TEI documentation to refer to the and only if "A is an admissible direct parent of B"). Please note that the word
conceptual device characterised by a given tag name such as `p` or `div` and not "element" is here used with the same meaning as in the TEI documentation to
to a particular instance of them that may occur in a given document. Figure refer to the conceptual device characterised by a given tag name such as `p` or
@fig:dictionaries-subgraph, by using this transformation to display only the `div` and not to a particular instance of them that may occur in a given
*dictionaries* module, hints at the overall complexity of the whole document. Figure @fig:dictionaries-subgraph, by using this transformation to
specification. display only the *dictionaries* module, hints at the overall complexity of the
whole specification.
![The subgraph of the *dictionaries* module](ressources/dictionaries.png){height=830px #fig:dictionaries-subgraph} ![The subgraph of the *dictionaries* module](ressources/dictionaries.png){height=830px #fig:dictionaries-subgraph}
...@@ -362,13 +362,13 @@ Using inclusion paths lets one find for instance that although `<pos/>` may not ...@@ -362,13 +362,13 @@ Using inclusion paths lets one find for instance that although `<pos/>` may not
be directly included within `<entry/>` elements to include information about the be directly included within `<entry/>` elements to include information about the
part-of-speech of the word that an article defines, the correct way to do so is part-of-speech of the word that an article defines, the correct way to do so is
through a `<form/>` or a `<gramGrp/>` because a thorough traversal reporting all through a `<form/>` or a `<gramGrp/>` because a thorough traversal reporting all
the possible paths will contain `entry-form-pos` and `entry-grapmGrp-pos`. It is the possible paths will contain `entry-form-pos` and `entry-gramGrp-pos`. It is
left to the human encoder to rate the relevance of the path found and to select left to the human encoder to rate the relevance of the path found and to select
an appropriate one. A total lack of path proves the impossibility of an an appropriate one. A total lack of path proves the impossibility of an
inclusion; an abnormally high length for the shortest path is a serious hint inclusion; an abnormally high length for the shortest path is a serious hint
that the inclusion should not be possible and is not meaningful. that the inclusion should not be possible and is not meaningful.
Another relevant example on the use of these methods can be given by querying Another relevant example of the use of these methods can be given by querying
the shortest inclusion path of a `<pos/>` under the `<body/>` of the document: the shortest inclusion path of a `<pos/>` under the `<body/>` of the document:
it yields an inclusion directly through `<entryFree/>` (with an inclusion path it yields an inclusion directly through `<entryFree/>` (with an inclusion path
of length 2), which unlike `<entry/>` accepts it as a direct child node. of length 2), which unlike `<entry/>` accepts it as a direct child node.
...@@ -387,7 +387,7 @@ associated to its definition. It is the natural way in from the `<body/>` ...@@ -387,7 +387,7 @@ associated to its definition. It is the natural way in from the `<body/>`
element to the *dictionaries* module: indeed, although `<body/>` may also element to the *dictionaries* module: indeed, although `<body/>` may also
contain `<entryFree/>` or `<superEntry/>` elements, the former is a relaxed contain `<entryFree/>` or `<superEntry/>` elements, the former is a relaxed
version of `<entry/>` while the latter is a device to group several related version of `<entry/>` while the latter is a device to group several related
entries together. Both can contain an `<entry/` directly while no obvious entries together. Both can contain an `<entry/>` directly while no obvious
inclusion exists the other way around: most (> 96.2%) of the inclusion paths of inclusion exists the other way around: most (> 96.2%) of the inclusion paths of
"reasonable" depth (which will be arbitrarily defined as strictly inferior to 5, "reasonable" depth (which will be arbitrarily defined as strictly inferior to 5,
that is twice the average shortest depth between any two nodes) either include that is twice the average shortest depth between any two nodes) either include
...@@ -396,7 +396,7 @@ to appear in an article in general, showing that the purpose of `<entry/>` is ...@@ -396,7 +396,7 @@ to appear in an article in general, showing that the purpose of `<entry/>` is
not to contain an `<entryFree/>` or `<superEntry/>`. Hence, not only the not to contain an `<entryFree/>` or `<superEntry/>`. Hence, not only the
semantics conveyed by the documentation but also the structure of the elements semantics conveyed by the documentation but also the structure of the elements
graph evidence `<entry/>` as the natural top-most element for an article. This graph evidence `<entry/>` as the natural top-most element for an article. This
example demonstrate again how a graph-centred approach can provide insights example demonstrates again how a graph-centred approach can provide insights
about the XML-TEI schema. about the XML-TEI schema.
Once a block for an article is created, it may contain elements useful to Once a block for an article is created, it may contain elements useful to
...@@ -467,13 +467,13 @@ which belongs for example the `<ref/>` element), the *dictionaries* module ...@@ -467,13 +467,13 @@ which belongs for example the `<ref/>` element), the *dictionaries* module
appears somewhat isolated from important structural elements like `<head/>` or appears somewhat isolated from important structural elements like `<head/>` or
`<div/>`. Indeed, computing all the paths from either `<entry/>` or `<sense/>` `<div/>`. Indeed, computing all the paths from either `<entry/>` or `<sense/>`
elements to the latter of length shorter or equal to 5 by a systematic traversal elements to the latter of length shorter or equal to 5 by a systematic traversal
of the graph yields exclusively paths (respectively 9042 and 39093 of them) of the graph yields exclusively paths (respectively 8 943 and 38 649 of them
containing either a `<floatingText/>` or an `<app/>` element. The first one, as excluding loops) containing either a `<floatingText/>` or an `<app/>` element.
its name aptly suggests, is used to encode text that does not quite fit the The first one, as its name aptly suggests, is used to encode text that does not
regular flow of the document, as for example in the context of an embedded quite fit the regular flow of the document, as for example in the context of an
narrative. Both examples displayed in the online documentation feature a embedded narrative. Both examples displayed in the online documentation feature
`<body/>` as direct child of `<floatingText/>`, neatly separating its content as a `<body/>` as direct child of `<floatingText/>`, neatly separating its content
independent. The purpose of the second one, although its name — short for as independent. The purpose of the second one, although its name — short for
apparatus — is less clear, is to wrap together several versions of the same apparatus — is less clear, is to wrap together several versions of the same
excerpts, for instance when there are several possible readings of an unclear excerpts, for instance when there are several possible readings of an unclear
group of words in a manuscript, or when the encoder is trying to compile a group of words in a manuscript, or when the encoder is trying to compile a
...@@ -487,21 +487,20 @@ structures like `<div/>`. ...@@ -487,21 +487,20 @@ structures like `<div/>`.
# A new standard ? {#sec:new-standard} # A new standard ? {#sec:new-standard}
Studying the content of *LGE* and considering several Studying the content of *LGE* and considering several articles in particular,
articles in particular, one can identify structures which are specific to one can identify structures which are specific to encyclopedias and not
encyclopedias and not compatible with the *dictionaries* module presented in the compatible with the *dictionaries* module presented in the previous section. It
previous section. It follows that this module is not able to encode arbitrary follows that this module is not able to encode arbitrary encyclopedic content
encyclopedic content and propose a new fully TEI-compliant encoding scheme and hence a new fully TEI-compliant encoding scheme is proposed. The rest of the
remaining outside of it. The rest of the section is concerned with the needs of section is concerned with the needs of automated encoding processes and compares
automated encoding processes and compares the proposal with other strategies to the proposal with other strategies to overcome the issues previously identified
overcome the issues previously identified with the dedicated module for with the dedicated module for dictionaries.
dictionaries.
## Idiosynchrasies of encyclopedias ## Idiosynchrasies of encyclopedias
Browsing through the pages of an encyclopedia reveals a certain number of Browsing through the pages of an encyclopedia reveals a certain number of
noticeable differences. A comprehensive list would be difficult to draw because noticeable differences. A comprehensive list would be difficult to draw because
of the great variety in terms of editorial choices the most obvious can be of the great variety in terms of editorial choices but the most obvious can be
discussed. discussed.
The first immediately visible feature that sets encyclopedias apart from The first immediately visible feature that sets encyclopedias apart from
...@@ -560,7 +559,7 @@ describing their relation to events and other persons comes out even further ...@@ -560,7 +559,7 @@ describing their relation to events and other persons comes out even further
from the notion of meaning. Entries such as the one about SANJO Sanetomi (see from the notion of meaning. Entries such as the one about SANJO Sanetomi (see
Figure @fig:sanjo) do not constitute a *definition*. Figure @fig:sanjo) do not constitute a *definition*.
![Begining of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/sanjo_t29.png){#fig:sanjo} ![Beginning of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/sanjo_t29.png){#fig:sanjo}
Moreover, encyclopedias, because of all that they have inherited from the Moreover, encyclopedias, because of all that they have inherited from the
philosophical Enlightenment, are not only spaces designed to assert, they also philosophical Enlightenment, are not only spaces designed to assert, they also
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment