Skip to content
Snippets Groups Projects
Commit ccab4d4f authored by Alice Brenon's avatar Alice Brenon
Browse files

Keep describing our ideal encoding scheme

parent 82ce3cf7
No related branches found
No related tags found
No related merge requests found
......@@ -573,12 +573,14 @@ article, "Cathète" from tome 9.
### The scheme
Remaining within the *core* module for the structure, almost all useful elements
are available and our encoding scheme merely quotes the official documentation.
Each article is represented by a `<div/>`. We suggest setting an `xml:id`
attribute on it with as value the — unique, or made so by suffixing a number
representing its rank among the various occurrences, even when there's only one
for the sake of regularity — head word of the entry, normalised to lowercase,
stripping spaces and replacing all non-alphanumerical characters by a dash `'-'`
to avoid issues with the XML encoding.
attribute on it with as value the — unique in the whole corpus, or made so by
suffixing a number representing its rank among the various occurrences, even
when there's only one for the sake of regularity — head word of the entry,
normalised to lowercase, stripping spaces and replacing all non-alphanumerical
characters by a dash `'-'` to avoid issues with the XML encoding.
![](snippets/cathète_0.png)
......@@ -624,21 +626,43 @@ is cut from the headword by being in a separate XML element, they still occur on
the same line, which is a typographic choice usually made both in encyclopedias
and dictionaries where space is at a premium.
Finally, the various sections and sub-sections occurring within the article body
may be nested as usual with `<div/>` and sub-`<div/>`s, filled with `<p/>` for
paragraphs which can each be titled with `<head/>` elements local to each
`<div/>`.
To complete the structure, the various sections and subsections occurring
within the article body may be nested as usual with `<div/>` and sub-`<div/>`s,
filled with `<p/>` for paragraphs which can each be titled with `<head/>`
elements local to each `<div/>`.
![](snippets/cathète_3.png)
But a typical page of an encyclopedia also features peritext elements, giving
information to the reader about the current page number along with the headwords
of the first and last articles appearing on the page.
Some articles have figures with captions, which should be encoded the standard
way by `<figure/>` and `<figDesc/>`.
FIGURE ILLUSTRATION
Depending
Another issue of giving up on `<entry/>` is the unavailability of the `<xr/>`
element to represent cross-references which occur in encyclopedias as well as in
dictionaries. We prefer giving up on it to keep only the `<ref/>` element which
is available in the context of a `<p/>`. Another solution would have been to
introduce a `<dictScrap/>` element for the sole purpose of placing an `<xr/>`
but we advocate against it on account of the verbosity it adds to the encoding
and the fact that it implicitly suggests that the previous context was not the
one of a dictionary.
Moreover, the layout is
often
XR ILLUSTRATION
But a typical page of an encyclopedia also features peritext elements, giving
information to the reader about the current page number along with the headwords
of the first and last articles appearing on the page. Those can be encoded by
`<fw/>` elements ("forme work") which `place` and `type` attributes should be
set to position them on the page and identify their function if it has been
recognized (those short elements on the border of pages are the ones typically
prone to suffer damages or be misread by the OCR).
Finally there are also TEI elements useful to represent "events" in the flow of the
text, like the begining of a new column of text or of a new page. The usual
appropriate elements (`<pb/>` for page begining, `<cb/>` for column begining)
may and should be used with our encoding scheme.
ALCALA DE HÉNARÈS
### Currently implemented
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment