Skip to content
Snippets Groups Projects
Commit 07d82169 authored by Alice Brenon's avatar Alice Brenon
Browse files

Some clarification of the description + added structural remarks on the module

parent e5b7f50f
No related branches found
No related tags found
No related merge requests found
...@@ -183,30 +183,32 @@ Using classical, well-known methods such as Dijkstra's algorithm (Dijkstra, 1959 ...@@ -183,30 +183,32 @@ Using classical, well-known methods such as Dijkstra's algorithm (Dijkstra, 1959
allows us to explore the shortest inclusion paths that exist between elements. allows us to explore the shortest inclusion paths that exist between elements.
Though a particular caution should be applied because there is no guarantee that Though a particular caution should be applied because there is no guarantee that
the shortest path is meaningful in general, it at least provides us with an the shortest path is meaningful in general, it at least provides us with an
efficient way to check whether a given element may or not be nested under efficient way to check whether a given element may or not be nested at all under
another one at all and gives an order of magnitude on the length of the path to another one and gives an order of magnitude on the length of the path to expect.
expect. Of course the accuracy of this heuristic decreases as the length of the Of course the accuracy of this heuristic decreases as the length of the elements
elements increases in a perfect graph representing the intended, meaningful path increases in a perfect graph representing the intended, meaningful path between
between two nodes, but this formalism lets us consider elements combinations two nodes, but the general graph formalism enables us to extend the results
rationally and exhaustively by algorithmic means. produced by the shortest-path approach and consider elements combinations
rationally and exhaustively by algorithmic means should the need occur.
For instance, it lets one find that although `<pos/>` may not be directly For instance, it lets one find that although `<pos/>` may not be directly
included within `<entry/>` elements to include information about the included within `<entry/>` elements to include information about the
part-of-speech of the word that an article defines, the correct way to do so is part-of-speech of the word that an article defines, the correct way to do so is
through a `<gramGrp/>`. On the other hand, trying to discover the shortest through a `<form/>` or a `<gramGrp/>`. On the other hand, trying to discover the
inclusion path to `<pos/>` from the `<TEI/>` root of the document yields a shortest inclusion path to `<pos/>` from the `<TEI/>` root of the document
`<standOff/>`, an element dedicated to store contextual data that accompanies yields a `<standOff/>`, an element dedicated to store contextual data that
but is not part of the text, not unlike an annex, and probably not what we want accompanies but is not part of the text, not unlike an annex, and widely
in the context of encoding an encyclopedia. A last relevant example on the use unrelated to the context of encoding an encyclopedia. A last relevant example on
of this approach can be given by querying the shortest inclusion path of a the use of these methods can be given by querying the shortest inclusion path of
`<pos/>` under the `<body/>` of the document: it yields an inclusion directly a `<pos/>` under the `<body/>` of the document: it yields an inclusion directly
through `<entryFree/>` (with an inclusion path of length 2), which, unlike through `<entryFree/>` (with an inclusion path of length 2), which unlike
`<entry/>` allows it as a direct child node. Possibly not what we want depending `<entry/>` accepts it as a direct child node. Possibly not what we want
on the regularity of the articles we are encoding and the existence of other depending on the regularity of the articles we are encoding and the occurrence
grammatical information such as `<case/>` or `<gen/>` in languages with an of other grammatical information such as `<case/>` or `<gen/>` to justify the
inflexion system to justify the use of the `<gramGrp/>`, but it gives a good use of the `<gramGrp/>`, but searching exhaustively for paths up to length 3
general idea: `<pos/>` does not need to be nested very deep, it can appear quite returns as expected the path through `<entry/>`, among others. Overall, we get a
near the "surface" of article entries. good general idea: `<pos/>` does not need to be nested very deep, it can appear
quite near the "surface" of article entries.
### The `<entry/>` element ### The `<entry/>` element
...@@ -234,7 +236,8 @@ represent features such as ...@@ -234,7 +236,8 @@ represent features such as
- a group of grammatical information: `<gramGrp/>`, that may itself contain as - a group of grammatical information: `<gramGrp/>`, that may itself contain as
we've seen above `<case/>`, `<gen/>`, `<number/>` or `<pers/>` to describe the we've seen above `<case/>`, `<gen/>`, `<number/>` or `<pers/>` to describe the
form itself for instance, but also information about the categories it belongs form itself for instance, but also information about the categories it belongs
to like `<iType/>` for its inflexion class or `<pos/>` for its part-of-speech to like `<iType/>` for its inflection class in languages with a declension
system or `<pos/>` for its part-of-speech
- its etymology - its etymology
- its variants if there is a different spelling in a variety of the language or - its variants if there is a different spelling in a variety of the language or
if it has changed through time if it has changed through time
...@@ -274,7 +277,38 @@ definition of the term with `<def/>`, usage examples with `<usg/>` and other ...@@ -274,7 +277,38 @@ definition of the term with `<def/>`, usage examples with `<usg/>` and other
high-level information such as translations in other languages. Both `<def/>` high-level information such as translations in other languages. Both `<def/>`
and `<usg/>` elements may appear directly under the `<entry/>`. and `<usg/>` elements may appear directly under the `<entry/>`.
### Remarks about structure ### Structural remarks
Before concluding this description of the *dictionaries* module from the
perspective of someone trying to concretely encode a particular dictionary or
encyclopedia, we make use of the graph approach again to evidence some its
aspects in terms of inclusion structure.
First, it is remarkable that all elements in the *dictionaries* module have a
cyclic inclusion path, that is to say, there is an inclusion path from each
element of this module to itself. Although having such a cycle is a widespread
property in the remainder of XML-TEI elements shared by 73.9% of them (413 out
of the 559 elements in the other modules), all 31 elements of the *dictionaries*
module having one is far above this average. In addition, the cycles appear to
be rather short, with an average length of 1.96 versus 2.50 in the rest of the
population. This observation is all the more surprising considering the fact
that the *dictionaries* module contains short "leaf" elements like `<pos/>`
which do not obviously require to admit cycles since one rather expects them to
contain only one word, like `<pos>adj</pos>` in the example given in the
official documentation.
Secondly, although we have seen examples of connections from this module to the
rest of the XML-TEI, especially the *core* module (see the case of the `<ref/>`
element above), the *dictionaries* appears somewhat isolated from important
structural elements like `<head/>` or `<div/>`. Indeed, computing all the paths
from either `<entry/>` or `<sense/>` elements to the latter of length shorter or
equal to 5 by a systematic traversal of the graph yields exclusively paths
(respectively 9042 and 39093 of them) containing either a `<floatingText/>` or
an `<app/>` element. The first one is used to encode
Thus, despite a rather dense internal connectivity, the *dictionaries* module
fails to provide encoders with a device to represent recursively nesting
structures like `<div/>`.
# A new standard ? # A new standard ?
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment