Skip to content
Snippets Groups Projects
Commit c38691ae authored by Alice Brenon's avatar Alice Brenon
Browse files

Fix typos *I* found in the text

parent eed116b1
No related branches found
No related tags found
No related merge requests found
......@@ -206,7 +206,7 @@ projects of the 18^th^ century. In this version, the definition
was entirely reworked, mildly stating that good encyclopedias are difficult to
make because of the amount of knowledge necessary and work needed to keep up
with scientific progress instead of calling the effort a parody. It credits
Chamber's *Cyclopædia* for being a decent attempt before referring anonymously
Chambers' *Cyclopædia* for being a decent attempt before referring anonymously
though quite explicitly to Diderot and d'Alembert's project by naming the
collective "Une Société de gens de Lettres" and writing that it started in 1751.
Even more importantly, two new entries were added after it: one for the
......@@ -260,18 +260,17 @@ NENUFAR
and BASNUM
([https://anr.fr/Projet-ANR-18-CE38-0003](https://anr.fr/Projet-ANR-18-CE38-0003))
to encode respectively the *Petit Larousse Illustré* published by Pierre
Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE*
Larousse in 1905 [@bohbot2018, p. 1], roughly contemporary to *LGE*, and the
*Dictionnaire Universel* by Furetière, or rather its second edition edited by
Henri Basnage de Beauval, an encyclopedic dictionary from the very early 18^th^
century [@williams2017, p. 1]. These successes suggested it to be a useful tool
to encode encyclopedias but a few differences remained between both projects and
DISCO-LGE: the text studied by NENUFAR does not have the encyclopedic dimension
*LGE* has and BASNUM studies a much older text which had a tremendous influence on the
european encyclopedic effort of the 18^th^ century but is not as clearly
separated from the dictionaric stem as *La Grande Encyclopédie* is. For these
reasons, the encoding schemes used in these projects could not be reused
directly, prompting for a systematic exploration of the XML-TEI schema to devise
a new one.
*LGE* has and BASNUM studies a much older text which had a tremendous influence
on the european encyclopedic effort of the 18^th^ century but is not as clearly
separated from the dictionaric stem as *LGE* is. For these reasons, the encoding
schemes used in these projects could not be reused directly, prompting for a
systematic exploration of the XML-TEI schema to devise a new one.
This chapter discusses XML elements and hence needs to name and manipulate them.
They will be represented in a monospace font, in the standard XML autoclosing
......@@ -315,14 +314,15 @@ The XML-TEI guidelines graph will hence be defined as follows. One node is
created for each one of the 590 elements found in the specification. Then, an
edge is placed between source node `A` and destination `B` if the schema states
that the element represented by `B` can be contained directly under the element
represented by `B`. That is, the edges in the graph represent the relation "is
an admissible direct parent of". Please note that the word "element" is here
used with the same meaning as in the TEI documentation to refer to the
conceptual device characterised by a given tag name such as `p` or `div` and not
to a particular instance of them that may occur in a given document. Figure
@fig:dictionaries-subgraph, by using this transformation to display only the
*dictionaries* module, hints at the overall complexity of the whole
specification.
represented by `A`. That is, the edges in the graph represent the relation "is
an admissible direct parent of" (written infix, as in "A is connected to B" if
and only if "A is an admissible direct parent of B"). Please note that the word
"element" is here used with the same meaning as in the TEI documentation to
refer to the conceptual device characterised by a given tag name such as `p` or
`div` and not to a particular instance of them that may occur in a given
document. Figure @fig:dictionaries-subgraph, by using this transformation to
display only the *dictionaries* module, hints at the overall complexity of the
whole specification.
![The subgraph of the *dictionaries* module](ressources/dictionaries.png){height=830px #fig:dictionaries-subgraph}
......@@ -362,13 +362,13 @@ Using inclusion paths lets one find for instance that although `<pos/>` may not
be directly included within `<entry/>` elements to include information about the
part-of-speech of the word that an article defines, the correct way to do so is
through a `<form/>` or a `<gramGrp/>` because a thorough traversal reporting all
the possible paths will contain `entry-form-pos` and `entry-grapmGrp-pos`. It is
the possible paths will contain `entry-form-pos` and `entry-gramGrp-pos`. It is
left to the human encoder to rate the relevance of the path found and to select
an appropriate one. A total lack of path proves the impossibility of an
inclusion; an abnormally high length for the shortest path is a serious hint
that the inclusion should not be possible and is not meaningful.
Another relevant example on the use of these methods can be given by querying
Another relevant example of the use of these methods can be given by querying
the shortest inclusion path of a `<pos/>` under the `<body/>` of the document:
it yields an inclusion directly through `<entryFree/>` (with an inclusion path
of length 2), which unlike `<entry/>` accepts it as a direct child node.
......@@ -387,7 +387,7 @@ associated to its definition. It is the natural way in from the `<body/>`
element to the *dictionaries* module: indeed, although `<body/>` may also
contain `<entryFree/>` or `<superEntry/>` elements, the former is a relaxed
version of `<entry/>` while the latter is a device to group several related
entries together. Both can contain an `<entry/` directly while no obvious
entries together. Both can contain an `<entry/>` directly while no obvious
inclusion exists the other way around: most (> 96.2%) of the inclusion paths of
"reasonable" depth (which will be arbitrarily defined as strictly inferior to 5,
that is twice the average shortest depth between any two nodes) either include
......@@ -396,7 +396,7 @@ to appear in an article in general, showing that the purpose of `<entry/>` is
not to contain an `<entryFree/>` or `<superEntry/>`. Hence, not only the
semantics conveyed by the documentation but also the structure of the elements
graph evidence `<entry/>` as the natural top-most element for an article. This
example demonstrate again how a graph-centred approach can provide insights
example demonstrates again how a graph-centred approach can provide insights
about the XML-TEI schema.
Once a block for an article is created, it may contain elements useful to
......@@ -467,13 +467,13 @@ which belongs for example the `<ref/>` element), the *dictionaries* module
appears somewhat isolated from important structural elements like `<head/>` or
`<div/>`. Indeed, computing all the paths from either `<entry/>` or `<sense/>`
elements to the latter of length shorter or equal to 5 by a systematic traversal
of the graph yields exclusively paths (respectively 9042 and 39093 of them)
containing either a `<floatingText/>` or an `<app/>` element. The first one, as
its name aptly suggests, is used to encode text that does not quite fit the
regular flow of the document, as for example in the context of an embedded
narrative. Both examples displayed in the online documentation feature a
`<body/>` as direct child of `<floatingText/>`, neatly separating its content as
independent. The purpose of the second one, although its name — short for
of the graph yields exclusively paths (respectively 8 943 and 38 649 of them
excluding loops) containing either a `<floatingText/>` or an `<app/>` element.
The first one, as its name aptly suggests, is used to encode text that does not
quite fit the regular flow of the document, as for example in the context of an
embedded narrative. Both examples displayed in the online documentation feature
a `<body/>` as direct child of `<floatingText/>`, neatly separating its content
as independent. The purpose of the second one, although its name — short for
apparatus — is less clear, is to wrap together several versions of the same
excerpts, for instance when there are several possible readings of an unclear
group of words in a manuscript, or when the encoder is trying to compile a
......@@ -487,21 +487,20 @@ structures like `<div/>`.
# A new standard ? {#sec:new-standard}
Studying the content of *LGE* and considering several
articles in particular, one can identify structures which are specific to
encyclopedias and not compatible with the *dictionaries* module presented in the
previous section. It follows that this module is not able to encode arbitrary
encyclopedic content and propose a new fully TEI-compliant encoding scheme
remaining outside of it. The rest of the section is concerned with the needs of
automated encoding processes and compares the proposal with other strategies to
overcome the issues previously identified with the dedicated module for
dictionaries.
Studying the content of *LGE* and considering several articles in particular,
one can identify structures which are specific to encyclopedias and not
compatible with the *dictionaries* module presented in the previous section. It
follows that this module is not able to encode arbitrary encyclopedic content
and hence a new fully TEI-compliant encoding scheme is proposed. The rest of the
section is concerned with the needs of automated encoding processes and compares
the proposal with other strategies to overcome the issues previously identified
with the dedicated module for dictionaries.
## Idiosynchrasies of encyclopedias
Browsing through the pages of an encyclopedia reveals a certain number of
noticeable differences. A comprehensive list would be difficult to draw because
of the great variety in terms of editorial choices the most obvious can be
of the great variety in terms of editorial choices but the most obvious can be
discussed.
The first immediately visible feature that sets encyclopedias apart from
......@@ -560,7 +559,7 @@ describing their relation to events and other persons comes out even further
from the notion of meaning. Entries such as the one about SANJO Sanetomi (see
Figure @fig:sanjo) do not constitute a *definition*.
![Begining of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/sanjo_t29.png){#fig:sanjo}
![Beginning of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29 ([BnF - Gallica](http://ark.bnf.fr/ark:/12148/cb41651490t))](ressources/sanjo_t29.png){#fig:sanjo}
Moreover, encyclopedias, because of all that they have inherited from the
philosophical Enlightenment, are not only spaces designed to assert, they also
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment