Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
I
ICHLL11 Article
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Alice Brenon
ICHLL11 Article
Commits
ccab4d4f
Commit
ccab4d4f
authored
3 years ago
by
Alice Brenon
Browse files
Options
Downloads
Patches
Plain Diff
Keep describing our ideal encoding scheme
parent
82ce3cf7
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
ICHLL_Brenon.md
+39
-15
39 additions, 15 deletions
ICHLL_Brenon.md
with
39 additions
and
15 deletions
ICHLL_Brenon.md
+
39
−
15
View file @
ccab4d4f
...
...
@@ -573,12 +573,14 @@ article, "Cathète" from tome 9.
### The scheme
Remaining within the
*core*
module for the structure, almost all useful elements
are available and our encoding scheme merely quotes the official documentation.
Each article is represented by a
`<div/>`
. We suggest setting an
`xml:id`
attribute on it with as value the — unique
, or made so by suffixing a number
representing its rank among the various occurrences, even
when there's only one
for the sake of regularity — head word of the entry,
normalised to lowercase,
stripping spaces and replacing all non-alphanumerical
characters by a dash
`'-'`
to avoid issues with the XML encoding.
attribute on it with as value the — unique
in the whole corpus, or made so by
suffixing a number
representing its rank among the various occurrences, even
when there's only one
for the sake of regularity — head word of the entry,
normalised to lowercase,
stripping spaces and replacing all non-alphanumerical
characters by a dash
`'-'`
to avoid issues with the XML encoding.

...
...
@@ -624,21 +626,43 @@ is cut from the headword by being in a separate XML element, they still occur on
the same line, which is a typographic choice usually made both in encyclopedias
and dictionaries where space is at a premium.
Finally
, the various sections and sub
-
sections occurring
within the article body
may be nested as usual with
`<div/>`
and sub-
`<div/>`
s,
filled with
`<p/>`
for
paragraphs which can each be titled with
`<head/>`
elements local to each
`<div/>`
.
To complete the structure
, the various sections and subsections occurring
within the article body
may be nested as usual with
`<div/>`
and sub-
`<div/>`
s,
filled with
`<p/>`
for
paragraphs which can each be titled with
`<head/>`
elements local to each
`<div/>`
.

But a typical page of an encyclopedia also features peritext elements, giving
information to the reader about the current page number along with the headwords
of the first and last articles appearing on the page.
Some articles have figures with captions, which should be encoded the standard
way by
`<figure/>`
and
`<figDesc/>`
.
FIGURE ILLUSTRATION
Depending
Another issue of giving up on
`<entry/>`
is the unavailability of the
`<xr/>`
element to represent cross-references which occur in encyclopedias as well as in
dictionaries. We prefer giving up on it to keep only the
`<ref/>`
element which
is available in the context of a
`<p/>`
. Another solution would have been to
introduce a
`<dictScrap/>`
element for the sole purpose of placing an
`<xr/>`
but we advocate against it on account of the verbosity it adds to the encoding
and the fact that it implicitly suggests that the previous context was not the
one of a dictionary.
Moreover, the layout is
often
XR ILLUSTRATION
But a typical page of an encyclopedia also features peritext elements, giving
information to the reader about the current page number along with the headwords
of the first and last articles appearing on the page. Those can be encoded by
`<fw/>`
elements ("forme work") which
`place`
and
`type`
attributes should be
set to position them on the page and identify their function if it has been
recognized (those short elements on the border of pages are the ones typically
prone to suffer damages or be misread by the OCR).
Finally there are also TEI elements useful to represent "events" in the flow of the
text, like the begining of a new column of text or of a new page. The usual
appropriate elements (
`<pb/>`
for page begining,
`<cb/>`
for column begining)
may and should be used with our encoding scheme.
ALCALA DE HÉNARÈS
### Currently implemented
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment