@@ -189,7 +189,7 @@ d'Alembert insists on the importance of brevity for a clear definition in the
...
@@ -189,7 +189,7 @@ d'Alembert insists on the importance of brevity for a clear definition in the
consider encyclopedias superior to dictionaries but really as a new subgenre
consider encyclopedias superior to dictionaries but really as a new subgenre
departing from them in terms of purpose.
departing from them in terms of purpose.
# The *dictionaries* TEI module
# The *dictionaries* TEI module {#sec:dictionaries-module}
The XML-TEI standard has a modular structure consisting of optional parts each
The XML-TEI standard has a modular structure consisting of optional parts each
covering specific needs such as the physical features of a source document, the
covering specific needs such as the physical features of a source document, the
...
@@ -406,13 +406,13 @@ element made to group quotations with a bibliographic reference to their source
...
@@ -406,13 +406,13 @@ element made to group quotations with a bibliographic reference to their source
which should clearly be unnecessary to encode an article in the general case.
which should clearly be unnecessary to encode an article in the general case.
Secondly, although we have seen examples of connections from this module to the
Secondly, although we have seen examples of connections from this module to the
rest of the XML-TEI, especially to the *core* module (see the case of the
rest of the XML-TEI, especially to the *core* module (to which belongs for
`<ref/>` element above), the *dictionaries* module appears somewhat isolated
example the `<ref/>` element), the *dictionaries* module appears somewhat
from important structural elements like `<head/>` or `<div/>`. Indeed, computing
isolated from important structural elements like `<head/>` or `<div/>`. Indeed,
all the paths from either `<entry/>` or `<sense/>` elements to the latter of
computing all the paths from either `<entry/>` or `<sense/>` elements to the
length shorter or equal to 5 by a systematic traversal of the graph yields
latter of length shorter or equal to 5 by a systematic traversal of the graph
exclusively paths (respectively 9042 and 39093 of them) containing either a
yields exclusively paths (respectively 9042 and 39093 of them) containing either
`<floatingText/>` or an `<app/>` element. The first one, as its name aptly
a `<floatingText/>` or an `<app/>` element. The first one, as its name aptly
suggests, is used to encode text that does not quite fit the regular flow of the
suggests, is used to encode text that does not quite fit the regular flow of the
document, as for example in the context of an embedded narrative. Both examples
document, as for example in the context of an embedded narrative. Both examples
displayed in the online documentation feature a `<body/>` as direct child of
displayed in the online documentation feature a `<body/>` as direct child of
...
@@ -433,12 +433,13 @@ structures like `<div/>`.
...
@@ -433,12 +433,13 @@ structures like `<div/>`.
Studying the content of *La Grande Encyclopédie* and considering several
Studying the content of *La Grande Encyclopédie* and considering several
articles in particular, we identify structures which are specific to
articles in particular, we identify structures which are specific to
encyclopedias and not compatible with the *dictionaries* module presented above.
encyclopedias and not compatible with the *dictionaries* module presented in the
We hence conclude that this module is not able to encode arbitrary encyclopedic
previous section. We hence conclude that this module is not able to encode
content and propose a new fully TEI-compliant encoding scheme remaining outside
arbitrary encyclopedic content and propose a new fully TEI-compliant encoding
of it. We proceed with remarks about the needs of automated encoding processes
scheme remaining outside of it. We proceed with remarks about the needs of
and compare our proposal with other strategies to overcome the issues previously
automated encoding processes and compare our proposal with other strategies to
identified with the dedicated module for dictionaries.
overcome the issues previously identified with the dedicated module for
dictionaries.
## Idiosynchrasies of encyclopedias
## Idiosynchrasies of encyclopedias
...
@@ -455,7 +456,7 @@ system. Those generally cover a broad range of subjects from scientific
...
@@ -455,7 +456,7 @@ system. Those generally cover a broad range of subjects from scientific
disciplines to litterature, and extending to political subjects and law.
disciplines to litterature, and extending to political subjects and law.
No element in the *dictionaries* module is explicitely designed for the purpose
No element in the *dictionaries* module is explicitely designed for the purpose
of encoding these indicators. As we have seen above, the elements set is geared
of encoding these indicators. As we have seen, the elements set is geared
towards the words themselves instead of the concept they represent. The closest
towards the words themselves instead of the concept they represent. The closest
tool for what we need is found in the `<usg/>` element used with a specific
tool for what we need is found in the `<usg/>` element used with a specific
`type` attribute set to `dom` for "domain". Indeed several examples from the
`type` attribute set to `dom` for "domain". Indeed several examples from the
...
@@ -553,8 +554,6 @@ and `<title/>` — the latter with the possibility to set its `type` attribute t
...
@@ -553,8 +554,6 @@ and `<title/>` — the latter with the possibility to set its `type` attribute t
`sub` — stand out as the best candidates for the semantics condition on the
`sub` — stand out as the best candidates for the semantics condition on the
second element.
second element.
#### Candidates in the *dictionaries* module {-}
Filtering the content of the module to keep only the elements which can at the
Filtering the content of the module to keep only the elements which can at the
same time contain themselves, be included under `<entry/>` and include a `<p/>`
same time contain themselves, be included under `<entry/>` and include a `<p/>`
and either the `<head/>` or `<title/>` elements yields absolutely no candidates.
and either the `<head/>` or `<title/>` elements yields absolutely no candidates.
...
@@ -590,8 +589,6 @@ elements could fulfill our purpose, it is a fact than no element in this module
...
@@ -590,8 +589,6 @@ elements could fulfill our purpose, it is a fact than no element in this module
appears as an obvious good solution and a serious hint to keep looking somewhere
appears as an obvious good solution and a serious hint to keep looking somewhere
else.
else.
#### Widening the search {-}
We hence widen our search to include elements outside the *dictionaries* module
We hence widen our search to include elements outside the *dictionaries* module
which could be used to encode our sections and subsections, under the same
which could be used to encode our sections and subsections, under the same
constraint as before to try and find a composite solution that would remain
constraint as before to try and find a composite solution that would remain
...
@@ -617,25 +614,26 @@ comment" which appears "out of the main textual stream" whereas the long
...
@@ -617,25 +614,26 @@ comment" which appears "out of the main textual stream" whereas the long
developments in articles are the very matter of the text of encyclopedias, not
developments in articles are the very matter of the text of encyclopedias, not
mere remarks in the margins or at the foot of pages.
mere remarks in the margins or at the foot of pages.
## Encoding within the *core* module
## Encoding within the *core* module {#sec:core-module}
The above remarks explain why the *dictionary* module is unable to represent
The remarks made in section @sec:dictionaries-module explain why the
encyclopedias, where the notion of "meaning" is less central that in
*dictionary* module is unable to represent encyclopedias, where the notion of
dictionaries and where discourse with nested structures of arbitrary depth can
"meaning" is less central that in dictionaries and where discourse with nested
occur. Even composite encodings using elements outside of the *dictionaries*
structures of arbitrary depth can occur. Even composite encodings using elements
module under an `<entry/>` element do not meet our requirements. Since the
outside of the *dictionaries* module under an `<entry/>` element do not meet our
*core* module of course accomodates these structures by means of the `<div/>`,
requirements. Since the *core* module obviously accomodates these structures by
`<head/>` and `<p/>` elements which have the additional advantage of carrying
means of the `<div/>`, `<head/>` and `<p/>` elements which have the additional
less semantical payload than `<sense/>` or `<def/>` we devise an encoding scheme
advantage of carrying less semantical payload than `<sense/>` or `<def/>` we
using them which we recommend using for other projects aiming at representing
devise an encoding scheme using them which we recommend using for other projects
encyclopedias.
aiming at representing encyclopedias.
To remain consistent with the above remarks we will only concern ourselves with
To remain consistent with the way we studied the *dictionaries* module we will
what happens at the level of each article, right under the `<body/>` element.
only concern ourselves with what happens at the level of each article, right
Everything related to metadata happens as expected in the file's `<teiHeader/>`
under the `<body/>` element. Everything related to metadata happens as expected
which is well-enough equiped to handle them. In order to present our scheme
in the file's `<teiHeader/>` which is well-enough equiped to handle them. In
throughout the following section we will be progressively encoding a reference
order to present our scheme throughout the following section we will be
article, "Cathète" from tome 9 reproduced in Figure @fig:cathete-photo.
progressively encoding a reference article, "Cathète" from tome 9 reproduced in
Figure @fig:cathete-photo.
)](ressources/cathète_t9.png){#fig:cathete-photo}
)](ressources/cathète_t9.png){#fig:cathete-photo}
...
@@ -748,15 +746,14 @@ encoding scheme as demonstrated by Figure @fig:alcala-xml.
...
@@ -748,15 +746,14 @@ encoding scheme as demonstrated by Figure @fig:alcala-xml.
{#fig:alcala-xml}
{#fig:alcala-xml}
The reference implementation for this encoding scheme is the program
The reference implementation for this encoding scheme is the program soprano