Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
I
ICHLL11 Article
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Alice Brenon
ICHLL11 Article
Commits
4cdcc821
Commit
4cdcc821
authored
3 years ago
by
Alice Brenon
Browse files
Options
Downloads
Patches
Plain Diff
First batch of fixes with feedback from Ludo (thanks !!)
parent
77ed690b
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
ICHLL_Brenon.md
+29
-18
29 additions, 18 deletions
ICHLL_Brenon.md
with
29 additions
and
18 deletions
ICHLL_Brenon.md
+
29
−
18
View file @
4cdcc821
---
---
title
:
The specificities of encoding encyclopedias: towards a new standard ?
title
:
The specificities of encoding encyclopedias: towards a new standard ?
author
:
Alice BRENON
author
:
Alice BRENON
numbersections
:
True
header-includes
:
header-includes
:
\usepackage{textalpha}
\usepackage{textalpha}
\usepackage{hyperref}
\usepackage{hyperref}
...
@@ -221,15 +222,18 @@ element to the dictionary module: indeed, although `<body/>` may also contain
...
@@ -221,15 +222,18 @@ element to the dictionary module: indeed, although `<body/>` may also contain
`<entryFree/>`
or
`<superEntry/>`
elements, the former is a relaxed version of
`<entryFree/>`
or
`<superEntry/>`
elements, the former is a relaxed version of
`<entry/>`
while the latter is a device to group several related entries
`<entry/>`
while the latter is a device to group several related entries
together. Both can contain an
`<entry/`
directly while no obvious inclusion
together. Both can contain an
`<entry/`
directly while no obvious inclusion
exists the other way around
. M
ost (> 96.2%) of the inclusion paths of
exists the other way around
: m
ost (> 96.2%) of the inclusion paths of
"reasonable" depth (which we define as strictly inferior to 5, that is twice the
"reasonable" depth (which we define as strictly inferior to 5, that is twice the
average shortest depth between any two nodes) seem to either include
`<figure/>`
average shortest depth between any two nodes) either include
`<figure/>`
or
or
`<castList/>`
, two elements unrelated to encyclopedia articles in the general
`<castList/>`
, two very specific elements which should not need to appear in an
case. Hence, not only the semantics conveyed by the documentation but also the
article in general, showing that the purpose of
`<entry/>`
is not to contain an
structure of the elements graph evidence
`<entry/>`
as the natural top-most
`<entryFree/>`
or
`<superEntry/>`
. Hence, not only the semantics conveyed by the
element for an article.
documentation but also the structure of the elements graph evidence
`<entry/>`
as the natural top-most element for an article. This somewhat contrived example
hopes to further demonstrate the application of a graph-centered approach to
understand the inner workings of the XML-TEI schema.
### Information about the word itself
### Information about the
head
word itself
Once a block for an article is created, it may contain elements useful to
Once a block for an article is created, it may contain elements useful to
represent features such as
represent features such as
...
@@ -240,9 +244,9 @@ represent features such as
...
@@ -240,9 +244,9 @@ represent features such as
form itself for instance, but also information about the categories it belongs
form itself for instance, but also information about the categories it belongs
to like
`<iType/>`
for its inflection class in languages with a declension
to like
`<iType/>`
for its inflection class in languages with a declension
system or
`<pos/>`
for its part-of-speech
system or
`<pos/>`
for its part-of-speech
-
its etymology
-
its etymology
:
`<etym/>
- its variants if there is a different spelling in a variety of the language or
- its variants if there is a different spelling in a variety of the language or
if it has changed through time
if it has changed through time
: `
<usg/>
` (though it is not its only purpose)
All these are examples and by no means an exhaustive list; the complete set
All these are examples and by no means an exhaustive list; the complete set
provides the encoder with a toolbox to describe all the information related to
provides the encoder with a toolbox to describe all the information related to
...
@@ -275,9 +279,10 @@ content associated to the headword by the entry. In a dictionary, that is its
...
@@ -275,9 +279,10 @@ content associated to the headword by the entry. In a dictionary, that is its
meaning.
meaning.
The `
<sense/>
` element is a valid child for `
<entry/>
` and groups together a
The `
<sense/>
` element is a valid child for `
<entry/>
` and groups together a
definition of the term with
`<def/>`
, usage examples with
`<usg/>`
and other
definition of the term with `
<def/>
`, usage examples with `
<usg/>
` (another use
high-level information such as translations in other languages. Both
`<def/>`
of this versatile element) and other high-level information such as translations
and
`<usg/>`
elements may appear directly under the
`<entry/>`
.
in other languages. Both `
<def/>
` and `
<usg/>
` elements may appear directly
under the `
<entry/>
`.
### Structural remarks
### Structural remarks
...
@@ -298,7 +303,8 @@ that the *dictionaries* module contains short "leaf" elements like `<pos/>`
...
@@ -298,7 +303,8 @@ that the *dictionaries* module contains short "leaf" elements like `<pos/>`
which should not obviously need to admit cycles since one rather expects them to
which should not obviously need to admit cycles since one rather expects them to
contain only one word, like `
<pos>
adj
</pos>
` in the example given in the
contain only one word, like `
<pos>
adj
</pos>
` in the example given in the
official documentation. Among those (shortest) cycles, 20 include the `
<cit/>
`
official documentation. Among those (shortest) cycles, 20 include the `
<cit/>
`
element made to group quotations with a bibliographic reference to their source.
element made to group quotations with a bibliographic reference to their source
which should clearly be unnecessary to encode an article in the general case.
Secondly, although we have seen examples of connections from this module to the
Secondly, although we have seen examples of connections from this module to the
rest of the XML-TEI, especially to the *core* module (see the case of the
rest of the XML-TEI, especially to the *core* module (see the case of the
...
@@ -420,11 +426,16 @@ often
...
@@ -420,11 +426,16 @@ often
### Currently implemented
### Currently implemented
The reference implementation for this encoding scheme is the program
`soprano`
The reference implementation for this encoding scheme is the program
developed within the scope of project DISCO-LGE. Though this software is already
soprano[^soprano] developed within the scope of project DISCO-LGE to
useful to segment the text of the encyclopedia into articles and encode them
automatically identify individual articles in the flow of raw text from the
into XML-TEI, it doesn't yet follow the above specification perfectly. Here is
column and to encode them into XML-TEI files. Though this software has already
for instance the encoded version of article "Cathète" currently it produces:
been used to produce the first TEI version of *La Grande Encyclopédie*, it
doesn't yet follow the above specification perfectly. Here is for instance the
encoded version of article "Cathète" currently it produces:
[^soprano]:
[https://gitlab.huma-num.fr/disco-lge/soprano](https://gitlab.huma-num.fr/disco-lge/soprano)


...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment