Skip to content
Snippets Groups Projects
Commit 847c5286 authored by Alice Brenon's avatar Alice Brenon
Browse files

Finish explaining why *dictionaries* doesn't work for encyclopedias

parent 80aa6795
No related branches found
No related tags found
No related merge requests found
...@@ -425,23 +425,135 @@ relevant. ...@@ -425,23 +425,135 @@ relevant.
### The notion of meaning ### The notion of meaning
### Nested structures Notwithstanding the correct way to represent domains of knowledge, their extent
itself raises concerns regarding the *dictionaries* module. Indeed, among the
vast collection of domains covered are sometimes historical articles and
biographies. If the notion of meaning can appear ill-fitting for a text
describing a series of historical events, one may still argue that it groups
them into a concept and associates it to the name of the event. But when it
comes to relating the life of a person, describing their relation to events and
other persons comes out even further from the notion of meaning. To what extent
is it relevant to consider that having discovered such or such thing or to have
been born on a certain time at a certain place *defines* someone ?
![Begining of the article relating the life of SANJO Sanetomi, in La Grande Encyclopédie, tome 29](ressources/sanjo_t29.png)
Moreover, encyclopedias, inheriting as much as they have from the philosophical
Enlightenment, are not only spaces designed to assert, they also intrinsically
include an interrogative component. Some articles lay down the basis required to
understand the complexity of an issue and invite the reader to consider it
without providing a definite answer, going as far as to explicitly using
question marks.
![Excerpt from article "Action", in La Grande Encyclopédie, tome 1](ressources/action_t1.png)
In this extract, the author devises a hypothetical situation to illustrate how
difficult it is to draw the line between two supposedly mutually exclusive
subcategories of legal actions. The whole point of the passage is to convey the
idea that the term eludes definition, wrapping it in a `<sense/>`, or worse, a
`<def/>` element would be an utter misnomer.
As a result, the use of `<sense/>` and `<def/>` is not appropriate for
encyclopedic content in general.
### Candidates in the *dictionaries* module ### Nested structures
- `<sense/>` The final difficulty can be considered as a partial consequence of the previous
- `<entryFree/>` one on the structure of articles. The difficulty to define complex concepts is
- `<note/>` the very reason why authors approach their subjects from various angles,
- `<dictScrap/>` / `<floatingText/>` circumnavigating it as a best approximation. This strategy favours long,
structured developments with sections and subsections covering the multiple
aspects of the topic: from a historical, political, scientific point of view…
The longest articles can thus span several dozens of pages. They can contain
substructures with titles on at least three levels (for instance, a `a)` under a
`1)` under a `I.`), each of which are in turn generally developed over several
paragraphs.
![La Grande Encyclopédie, tome 16, article "Europe" spans from p.782 to p.846, that is 64 pages and ends after over a column of bibliography](ressources/europe_t16.png)
The nested structure that we have just evidenced demands of course a nesting
structure to accomodate it. More precisely it guides our search of XML elements
by giving us several constraints: we are looking for a pair of elements, the
first representing a (sub)section must be able to include both itself and the
second element, which doesn't have any special constraint in addition to the one
it shares with the first, which is to have a semantics compatible with our
purpose. In addition, the first element must be able to contain several `<p/>`
elements, `<p/>` being the reference element to encode paragraphs according to
the XML-TEI documentation.
We have seen that the *dictionaries* module was equiped with a questionable but
possible element for subject domains. However, it does not include any element
for section titles. In the rest of the TEI specification, the elements `<head/>`
and `<title/>` — the latter with the possibility to set its `type` attribute to
`sub` — stand out as the best candidates for the semantics condition on the
second element.
#### Candidates in the *dictionaries* module
Filtering the content of the module to keep only the elements which can at the
same time contain themselves, be included under `<entry/>` and include a `<p/>`
and either the `<head/>` or `<title/>` elements yields absolutely no candidates.
The lack of results from this simple query forces us to somewhat release the
constraints on the elements we are willing to use. We can for instance make the
asumption that the occurrence of an intermediate element could be needed between
the `<entry/>` element and the recursing one used to encode sections. This
"section" element could also need a companion element to be able to include
itself, or, to formalise it in terms of graph theory, we could relax the
condition on this element to admit a loop by considering a cycle of a given
(small, this still needs to represent a fairly direct inclusion) length to be
enough. We simultaneously extend the maximum depth of the inclusion paths we are
looking for between `<entry/>`, the pair of elements and the `<p/>` element.
By setting this depth to 3, that is, by accepting one intermediate element to
occur in the middle of each one of the inclusion paths that define the structure
required to encode encyclopedic discourse, we find 21 elements but none of them
stand out as an obvious good solution: all paths to include the `<p/>` element
from any *dictionaries* element either contain a `<figure/>` (which we have
previously encountered earlier when we were practising our graph approach to
search for inclusions between `<entry/>` and `<entryFree/>` and dismissed as not
useful in general), a `<stage/>` (reserved to stage direction in dramatic works)
or a `<state/>` (used to describe a temporary quality in a person or place),
again not even close to what we want. The paths to either `<head/>` or
`<title/>` are similarly disappointing. If that is not a thorough proof that
none of these elements could fulfill our purpose, it is a fact than no element
in this module appears as an obvious solution and a serious hint to keep looking
somewhere else.
#### Widening the search
We hence widen our search to include elements outside the *dictionaries* module
which could be used to encode our sections and subsections, under the same
constraint as before to try and find a composite solution that would remain
under the `<entry/>` element even if resorting to subcomponents outside of the
dedicated module. Only three elements are returned:
- `<figure/>`: not any more useful to represent the content of encyclopedic
discourse than as a helper to include paragraphs
- `<metamark/>`: a very useful device to transcribe the edition marks than may
appear on a particular primary source to alter the normal flow of the text and
suggest an alternative reading (deletion, insertion, reordering, this is about
a human editing the text from a given physical copy of it), again really of no
use for a part of an article describing the geology of Europe for instance.
- `<note/>`: the first element that might at least resemble what we are looking
for. It is meant to contain text, is about explaning something and seems
general enough (not specific to a given genre, or to the occurrence of a
particular object on the page). Unfortunately, its semantics still seems a bit
off compared to our need. The documentation describes it as an "additional
comment", and, moreover "out of the main textual stream" whereas the long
developments in article are the very matter that inhabits the columns of text
encyclopedias are made of.
## Encoding within the *core* module ## Encoding within the *core* module
The above remarks explain why the *dictionary* module by itself is unable to The above remarks explain why the *dictionary* module by itself is unable to
represent encyclopedias, where discourse with nested structures of arbitrary represent encyclopedias, where the notion of "meaning" is less central that in
depth can occur. Since the *core* module of course accomodates these structures dictionaries and where discourse with nested structures of arbitrary depth can
by means of the `<div/>`, `<head/>` and `<p/>` elements, we devise an encoding occur. Since the *core* module of course accomodates these structures by means
scheme using them which we recommend using for other projects aiming at of the `<div/>`, `<head/>` and `<p/>` elements which have the additional
representing encyclopedias. advantage of carrying less semantical payload than `<sense/>` or `<def/>` we
devise an encoding scheme using them which we recommend using for other projects
aiming at representing encyclopedias.
To remain consistent with the above remarks we will only concern ourselves with To remain consistent with the above remarks we will only concern ourselves with
what happens at the level of each article, right under the `<body/>` element. what happens at the level of each article, right under the `<body/>` element.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment