Keep describing our ideal encoding scheme

ccab4d4f · Alice Brenon · 82ce3cf7 · ccab4d4f
Commit ccab4d4f authored 3 years ago by Alice Brenon
--- a/ICHLL_Brenon.md
+++ b/ICHLL_Brenon.md
@@ -573,12 +573,14 @@ article, "Cathète" from tome 9.
 ### The scheme
+Remaining within the *core* module for the structure, almost all useful elements
+are available and our encoding scheme merely quotes the official documentation.
 Each article is represented by a `<div/>`. We suggest setting an `xml:id`
-attribute on it with as value the — unique, or made so by suffixing a number
+attribute on it with as value the — unique in the whole corpus, or made so by
-representing its rank among the various occurrences, even when there's only one
+suffixing a number representing its rank among the various occurrences, even
-for the sake of regularity — head word of the entry, normalised to lowercase,
+when there's only one for the sake of regularity — head word of the entry,
-stripping spaces and replacing all non-alphanumerical characters by a dash `'-'`
+normalised to lowercase, stripping spaces and replacing all non-alphanumerical
-to avoid issues with the XML encoding.
+characters by a dash `'-'` to avoid issues with the XML encoding.
 ![](snippets/cathète_0.png)
@@ -624,21 +626,43 @@ is cut from the headword by being in a separate XML element, they still occur on
 the same line, which is a typographic choice usually made both in encyclopedias
 and dictionaries where space is at a premium.
-Finally, the various sections and sub-sections occurring within the article body
+To complete the structure, the various sections and subsections occurring
-may be nested as usual with `<div/>` and sub-`<div/>`s, filled with `<p/>` for
+within the article body may be nested as usual with `<div/>` and sub-`<div/>`s,
-paragraphs which can each be titled with `<head/>` elements local to each
+filled with `<p/>` for paragraphs which can each be titled with `<head/>`
-`<div/>`.
+elements local to each `<div/>`.
 ![](snippets/cathète_3.png)
-But a typical page of an encyclopedia also features peritext elements, giving
+Some articles have figures with captions, which should be encoded the standard
-information to the reader about the current page number along with the headwords
+way by `<figure/>` and `<figDesc/>`.
-of the first and last articles appearing on the page.
+FIGURE ILLUSTRATION
-Depending
+Another issue of giving up on `<entry/>` is the unavailability of the `<xr/>`
+element to represent cross-references which occur in encyclopedias as well as in
+dictionaries. We prefer giving up on it to keep only the `<ref/>` element which
+is available in the context of a `<p/>`. Another solution would have been to
+introduce a `<dictScrap/>` element for the sole purpose of placing an `<xr/>`
+but we advocate against it on account of the verbosity it adds to the encoding
+and the fact that it implicitly suggests that the previous context was not the
+one of a dictionary.
-Moreover, the layout is
+XR ILLUSTRATION
-often 
+But a typical page of an encyclopedia also features peritext elements, giving
+information to the reader about the current page number along with the headwords
+of the first and last articles appearing on the page. Those can be encoded by
+`<fw/>` elements ("forme work") which `place` and `type` attributes should be
+set to position them on the page and identify their function if it has been
+recognized (those short elements on the border of pages are the ones typically
+prone to suffer damages or be misread by the OCR).
+Finally there are also TEI elements useful to represent "events" in the flow of the
+text, like the begining of a new column of text or of a new page. The usual
+appropriate elements (`<pb/>` for page begining, `<cb/>` for column begining)
+may and should be used with our encoding scheme.
+ALCALA DE HÉNARÈS
 ### Currently implemented