From 0ea1d173c9c188a7d32f78ecd49565841ffe550a Mon Sep 17 00:00:00 2001 From: Alice BRENON <alice.brenon@ens-lyon.fr> Date: Wed, 23 Feb 2022 16:07:37 +0100 Subject: [PATCH] Developping description of XML-TEI some more --- ICHLL_Brenon.md | 52 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 13 deletions(-) diff --git a/ICHLL_Brenon.md b/ICHLL_Brenon.md index 356d250..b6a6a24 100644 --- a/ICHLL_Brenon.md +++ b/ICHLL_Brenon.md @@ -212,24 +212,50 @@ near the "surface" of article entries. The central element of the *dictionaries* module is the `<entry/>` element meant to encode one single entry in a dictionary, that is to say a head word -associated to its definition. It is the natural entry point from the `<body/>` +associated to its definition. It is the natural way in from the `<body/>` element to the dictionary module: indeed, although `<body/>` may also contain `<entryFree/>` or `<superEntry/>` elements, the former is a relaxed version of `<entry/>` while the latter is a device to group several related entries together. Both can contain an `<entry/` directly while no obvious inclusion -exists the other way around. Most of the inclusion paths of "reasonable" depth -(which we define to strictly inferior to 5, that is twice the average shortest -depth between any two nodes) seem to either include `<figure/>` +exists the other way around. Most (> 96.2%) of the inclusion paths of +"reasonable" depth (which we define as strictly inferior to 5, that is twice the +average shortest depth between any two nodes) seem to either include `<figure/>` +or `<castList/>`, two elements unrelated to encyclopedia articles in the general +case. Hence, not only the semantics conveyed by the documentation but also the +structure of the elements graph evidence `<entry/>` as the natural top-most +element for an article. + +### Information about the word itself + +Once a block for an article is created, it may contain elements useful to +represent features such as + +- its written and spoken forms: `<form/>` +- a group of grammatical information: `<gramGrp/>`, that may itself contain as + we've seen above `<case/>`, `<gen/>`, `<number/>` or `<pers/>` to describe the + form itself for instance, but also information about the categories it belongs + to like `<iType/>` for its inflexion class or `<pos/>` for its part-of-speech +- its etymology +- its variants if there is a different spelling in a variety of the language or + if it has changed through time + +All these are examples and by no means an exhaustive list; the complete set +provides the encoder with a toolbox to describe all the information related to +the form the entry is found at and seem general enough to accomodate the +structure of any book indexing entries by words. + +### Cross-references + +A common feature shared by dictionaries and encyclopedias is the ability to +connect entries together by using a word or short phrase as the link, referring +the reader to the related concept. This is known as cross-references and can +appear either when the definition of a term is adjacent to another one or to +catch alternative spellings where some readers might expect the word to appear +and redirect them to the form chosen as the reference. In XML-TEI, this is done +with the `<xr/>` element. + +### Content -Once a block for an article is created - -It contain elements useful to represent the features occurring at the begining -of an article such as its written and spoken forms (`<form/>`), a group of -grammatical information (`<gramGrp/>`), that may itself contain as we've seen -above `<case/>`, `<gen/>`, `<number/>` or `<pos/>` to describe the form itself for instance, or ` - -All these are quite exhaustive and seem general enough to accomodate any book -structure indexing entries by words. A more # A new standard ? -- GitLab