Skip to content
Snippets Groups Projects
Commit ccab4d4f authored by Alice Brenon's avatar Alice Brenon
Browse files

Keep describing our ideal encoding scheme

parent 82ce3cf7
No related branches found
No related tags found
No related merge requests found
...@@ -573,12 +573,14 @@ article, "Cathète" from tome 9. ...@@ -573,12 +573,14 @@ article, "Cathète" from tome 9.
### The scheme ### The scheme
Remaining within the *core* module for the structure, almost all useful elements
are available and our encoding scheme merely quotes the official documentation.
Each article is represented by a `<div/>`. We suggest setting an `xml:id` Each article is represented by a `<div/>`. We suggest setting an `xml:id`
attribute on it with as value the — unique, or made so by suffixing a number attribute on it with as value the — unique in the whole corpus, or made so by
representing its rank among the various occurrences, even when there's only one suffixing a number representing its rank among the various occurrences, even
for the sake of regularity — head word of the entry, normalised to lowercase, when there's only one for the sake of regularity — head word of the entry,
stripping spaces and replacing all non-alphanumerical characters by a dash `'-'` normalised to lowercase, stripping spaces and replacing all non-alphanumerical
to avoid issues with the XML encoding. characters by a dash `'-'` to avoid issues with the XML encoding.
![](snippets/cathète_0.png) ![](snippets/cathète_0.png)
...@@ -624,21 +626,43 @@ is cut from the headword by being in a separate XML element, they still occur on ...@@ -624,21 +626,43 @@ is cut from the headword by being in a separate XML element, they still occur on
the same line, which is a typographic choice usually made both in encyclopedias the same line, which is a typographic choice usually made both in encyclopedias
and dictionaries where space is at a premium. and dictionaries where space is at a premium.
Finally, the various sections and sub-sections occurring within the article body To complete the structure, the various sections and subsections occurring
may be nested as usual with `<div/>` and sub-`<div/>`s, filled with `<p/>` for within the article body may be nested as usual with `<div/>` and sub-`<div/>`s,
paragraphs which can each be titled with `<head/>` elements local to each filled with `<p/>` for paragraphs which can each be titled with `<head/>`
`<div/>`. elements local to each `<div/>`.
![](snippets/cathète_3.png) ![](snippets/cathète_3.png)
But a typical page of an encyclopedia also features peritext elements, giving Some articles have figures with captions, which should be encoded the standard
information to the reader about the current page number along with the headwords way by `<figure/>` and `<figDesc/>`.
of the first and last articles appearing on the page.
FIGURE ILLUSTRATION
Depending Another issue of giving up on `<entry/>` is the unavailability of the `<xr/>`
element to represent cross-references which occur in encyclopedias as well as in
dictionaries. We prefer giving up on it to keep only the `<ref/>` element which
is available in the context of a `<p/>`. Another solution would have been to
introduce a `<dictScrap/>` element for the sole purpose of placing an `<xr/>`
but we advocate against it on account of the verbosity it adds to the encoding
and the fact that it implicitly suggests that the previous context was not the
one of a dictionary.
Moreover, the layout is XR ILLUSTRATION
often
But a typical page of an encyclopedia also features peritext elements, giving
information to the reader about the current page number along with the headwords
of the first and last articles appearing on the page. Those can be encoded by
`<fw/>` elements ("forme work") which `place` and `type` attributes should be
set to position them on the page and identify their function if it has been
recognized (those short elements on the border of pages are the ones typically
prone to suffer damages or be misread by the OCR).
Finally there are also TEI elements useful to represent "events" in the flow of the
text, like the begining of a new column of text or of a new page. The usual
appropriate elements (`<pb/>` for page begining, `<cb/>` for column begining)
may and should be used with our encoding scheme.
ALCALA DE HÉNARÈS
### Currently implemented ### Currently implemented
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment