diff --git a/README.md b/README.md
index 9aa97b2f50319cb1edcafd7e1b8bca64f2cbf182..dba456c8031fd9db31e0c777528df9b821ead04b 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,14 @@
-# Sampling Géo-FR EDdA-DUT
+# Echantillonnage des articles de Géographie française dans l'Encyclopédie de Diderot et d'Alembert
 
+Ce dépôt est proposé par **Ludovic Moncla** et **Denis Vigier** dans le cadre du [Projet GEODE](https://geode-project.github.io/).
+Il contient le code développé pour la sélection de l'échantillon d'articles traitant de géographie française dans l'Encyclopédie de Diderot et d'Alembert (EDdA) et le Dictionnaire Universel de Trevoux (DUT)
+
+## Présentation
+
+![](./figures/schema.png)
+
+
+
+## Remerciements
+
+Les auteurs remercient le [LABEX ASLAN](https://aslan.universite-lyon.fr/) (ANR-10-LABX-0081) de l'Université de Lyon pour son soutien financier dans le cadre du programme français  "Investissements d'Avenir" géré par l'Agence Nationale de la Recherche  (ANR).
diff --git a/figures/schema.png b/figures/schema.png
new file mode 100644
index 0000000000000000000000000000000000000000..cb60773fb89e93902f97b28cd3d1d74eed79b798
Binary files /dev/null and b/figures/schema.png differ
diff --git a/samplingGeoFR-EDdA.ipynb b/samplingGeoFR-EDdA.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..a67afbda00ce0bc1a865374af2884739e514ee98
--- /dev/null
+++ b/samplingGeoFR-EDdA.ipynb
@@ -0,0 +1,4719 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Filtrage des articles de géographies de l'EDDA\n",
+    "\n",
+    "Ce notebook est proposé par [L. Moncla](https://ludovicmoncla.github.io/) et [D. Vigier](http://www.icar.cnrs.fr/membre/dvigier/) dans le cadre du projet [GEODE](https://geode-project.github.io/).\n",
+    "\n",
+    "Pour la publication proposée pour le numéro de Langue Française, on souhaite filtrer les articles de l'EDDA qui décrivent un lieu localisé en France. Un sous-ensemble de ces articles sera sélectionné aléatoirement et comparé au Trevoux.\n",
+    "On propose de faire 4 sous-groupes d'articles en fonction de leur auteur :\n",
+    "1. Diderot\n",
+    "2. Jaucourt\n",
+    "3. Autre auteur\n",
+    "4. Non signé\n",
+    "\n",
+    "Une fois ces 4 sous-groupes sélectionné on fait une nouvelle sélection en fonction de la longueur de l'article (nombre de mots). On redécoupe en 4 sous-groupes en fonction des quartiles.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Import des librairies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import random\n",
+    "import shutil\n",
+    "import lxml.etree as etree\n",
+    "from sentence_splitter import SentenceSplitter, split_text_into_sentences\n",
+    "import re\n",
+    "import pandas as pd\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Récupération des données issues de PERDIDO\n",
+    "\n",
+    "Les données sont issues du concordancier produit par Perdido."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# on charge les données du csv dans un dataframe\n",
+    "# fichier TSV généré par le script /Users/lmoncla/Nextcloud/Recherche/Projets/2019-MSH_GéoDISCO/Scripts/parsers/concordancierPERDIDO.py\n",
+    "\n",
+    "#data = pd.read_csv('../Data/statsPERDIDO_EDDAGeo.tsv', sep='\\t')\n",
+    "data = pd.read_csv('../Data/statsPERDIDO_EDDA_21_10_11.tsv', sep='\\t')\n",
+    "data = data.sort_values(by=['volume', 'number'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>6861</th>\n",
+       "      <td>volume01-1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ENCYCLOPÉDIE, DICTIONNAIRE RAISONNÉ DES SCIENC...</td>\n",
+       "      <td>Title Page</td>\n",
+       "      <td>unclassified</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>129</td>\n",
+       "      <td>24</td>\n",
+       "      <td>10</td>\n",
+       "      <td>4</td>\n",
+       "      <td>8</td>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22907</th>\n",
+       "      <td>volume01-2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>A MONSEIGNEUR LE COMTE D'ARGENSON, MINISTRE ET...</td>\n",
+       "      <td>A MONSEIGNEUR LE COMTE D'ARGENSON</td>\n",
+       "      <td>unclassified</td>\n",
+       "      <td>Diderot &amp; d'Alembert</td>\n",
+       "      <td>252</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13935</th>\n",
+       "      <td>volume01-3</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3</td>\n",
+       "      <td>DISCOURS PRÉLIMINAIRE DES EDITEURS. L'Encyclop...</td>\n",
+       "      <td>DISCOURS PRÉLIMINAIRE DES EDITEURS</td>\n",
+       "      <td>unclassified</td>\n",
+       "      <td>d'Alembert</td>\n",
+       "      <td>49007</td>\n",
+       "      <td>1013</td>\n",
+       "      <td>379</td>\n",
+       "      <td>177</td>\n",
+       "      <td>279</td>\n",
+       "      <td>6</td>\n",
+       "      <td>19</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17096</th>\n",
+       "      <td>volume01-4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>4</td>\n",
+       "      <td>ENCYCLOPÉDIE, DICTIONNAIRE RAISONNÉ DES SCIENC...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20692</th>\n",
+       "      <td>volume01-5</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5</td>\n",
+       "      <td>A, a &amp; a s.m. (ordre Encyclopéd. Entend. Scien...</td>\n",
+       "      <td>A, a &amp; a</td>\n",
+       "      <td>Grammaire</td>\n",
+       "      <td>Dumarsais5</td>\n",
+       "      <td>856</td>\n",
+       "      <td>28</td>\n",
+       "      <td>8</td>\n",
+       "      <td>11</td>\n",
+       "      <td>7</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "         filename  volume  number  \\\n",
+       "6861   volume01-1       1       1   \n",
+       "22907  volume01-2       1       2   \n",
+       "13935  volume01-3       1       3   \n",
+       "17096  volume01-4       1       4   \n",
+       "20692  volume01-5       1       5   \n",
+       "\n",
+       "                                                 content  \\\n",
+       "6861   ENCYCLOPÉDIE, DICTIONNAIRE RAISONNÉ DES SCIENC...   \n",
+       "22907  A MONSEIGNEUR LE COMTE D'ARGENSON, MINISTRE ET...   \n",
+       "13935  DISCOURS PRÉLIMINAIRE DES EDITEURS. L'Encyclop...   \n",
+       "17096  ENCYCLOPÉDIE, DICTIONNAIRE RAISONNÉ DES SCIENC...   \n",
+       "20692  A, a & a s.m. (ordre Encyclopéd. Entend. Scien...   \n",
+       "\n",
+       "                                 headword     normClass                author  \\\n",
+       "6861                           Title Page  unclassified              unsigned   \n",
+       "22907   A MONSEIGNEUR LE COMTE D'ARGENSON  unclassified  Diderot & d'Alembert   \n",
+       "13935  DISCOURS PRÉLIMINAIRE DES EDITEURS  unclassified            d'Alembert   \n",
+       "17096                                 NaN           NaN                   NaN   \n",
+       "20692                            A, a & a     Grammaire            Dumarsais5   \n",
+       "\n",
+       "       nb Words  nb EN  nb Name EDDA  nb Person  nb ENE  nb ENE Place  \\\n",
+       "6861        129     24            10          4       8             4   \n",
+       "22907       252      5             0          0       3             0   \n",
+       "13935     49007   1013           379        177     279             6   \n",
+       "17096        10      0             0          0       0             0   \n",
+       "20692       856     28             8         11       7             1   \n",
+       "\n",
+       "       nb ENE Person  nb EN geocoded  nb EN EDDA geocoded type  latlong  \\\n",
+       "6861               0               0                    0  NaN    False   \n",
+       "22907              2               0                    0  NaN    False   \n",
+       "13935             19               0                    0  NaN    False   \n",
+       "17096              0               0                    0  NaN    False   \n",
+       "20692              1               0                    0  NaN    False   \n",
+       "\n",
+       "       latlong value  \n",
+       "6861             NaN  \n",
+       "22907            NaN  \n",
+       "13935            NaN  \n",
+       "17096            NaN  \n",
+       "20692            NaN  "
+      ]
+     },
+     "execution_count": 53,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# On affiche les premières lignes\n",
+    "data.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "74165"
+      ]
+     },
+     "execution_count": 54,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Nombre d'articles présents dans ce jeu de données.\n",
+    "len(data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.1. Calcul des quartiles (par rapport au nombre de mots) pour l'ensemble des articles de géographie"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "25.0 43.0 86.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "q1, q2, q3 = data['nb Words'].quantile([0.25, 0.5, 0.75])\n",
+    "\n",
+    "print(q1, q2, q3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Filtrage selon si la premiere phrase contient \"classifieur de France\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "splitter = SentenceSplitter(language='fr')\n",
+    "\n",
+    "def filtreFrance(content):\n",
+    "    # initialisation de la 1ere variable de sortie\n",
+    "    found = False\n",
+    "    classifieur = ''\n",
+    "    \n",
+    "    # liste des mots qui peuvent être classifieurs de \"de France\"\n",
+    "    list_classifieurs = \"ville|Ville|riviere|rivieres|ile|Ile|île|isle|iles|îles|province|fleuve|bourg|Bourg|montagne|montagnes|lieu|royaume|Royaume|pays|village|port|bourgade|promontoire|Promontoire|comté|lac|lacs|forteresse|golfe|golphe|cap|capitale|canton|vallée|place|principauté|château|fauxbourg|fauxbourgs|fontaine|forêt|forêts|gouvernement|municipe|maison|nation|palatinat|Palatinat|campagne|duché|bailliage|bois|capitainerie|contrée|état|marais|cercle|district|eaux|écueil|écueils|paroisse|plaine|quartier|champ|endroit|forum|Forum|havre|passage|pont|ruisseau|terre|torrent|volcan|abbaye|baronie|capitainie|champ|champs|chef|chemin|cité|colline|désert|empire|détroit|entrepôt|fauxbourg|grotte|habitation|isthme|marquisat|mont|mur|palais|péninsule|préfecture|Province|rade|région|rocher|route|ruines|salines|seigneurie|station|territoire|hameau|mer|rue\"\n",
+    "       \n",
+    "    # on segmente le texte en phrases\n",
+    "    sentences = splitter.split(text=content)\n",
+    "    m = re.search(\"(\"+list_classifieurs+\") (\\w+\\s){0,3}de France\", sentences[0])\n",
+    "    if m:            \n",
+    "        found = True\n",
+    "        classifieur = m.group(1)\n",
+    "    else:\n",
+    "        pos = sentences[0].find('de France')\n",
+    "        if pos > -1:\n",
+    "            found = True\n",
+    "        \n",
+    "    return found, classifieur\n",
+    "\n",
+    "## On vectorise la fonction afin de l'appliquer de manière efficace (en terme de temps de calcul) sur le dataframe\n",
+    "v_filtreFrance = np.vectorize(filtreFrance)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data['de France'], data['classifieur de France'] = v_filtreFrance(data.content)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_france = data[(data['de France'] == True)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_france_cl = data_france[(data_france['classifieur de France'] != \"\")]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(1450, 21)"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_france.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(1415, 21)"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_france_cl.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Il y a 1415 articles avec \"de France\" et un classifieur et 35 sans classifieur\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('Il y a '+ str(len(data_france_cl)) + ' articles avec \"de France\" et un classifieur et ' +str(len(data_france)-len(data_france_cl))+ ' sans classifieur')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Filtrage des sous-groupes selon les auteurs\n",
+    "\n",
+    "### 4.1. Articles de géographie signés par Diderot"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(172, 21)"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_diderot = data_france_cl[(data_france_cl['author'] == 'Diderot')]\n",
+    "data_diderot.shape\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>...</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "      <th>de France</th>\n",
+       "      <th>classifieur de France</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>4962</th>\n",
+       "      <td>volume01-890</td>\n",
+       "      <td>1</td>\n",
+       "      <td>890</td>\n",
+       "      <td>* ADOUR, (Géog. mod.) riviere de France qui pr...</td>\n",
+       "      <td>ADOUR</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>42</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>hydronyme</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>riviere</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3616</th>\n",
+       "      <td>volume01-1065</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1065</td>\n",
+       "      <td>* Afrique, (Géog. mod.) petite ville de France...</td>\n",
+       "      <td>Afrique</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>12</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13155</th>\n",
+       "      <td>volume01-1087</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1087</td>\n",
+       "      <td>* AGDE, (Géog.) ville de France en Languedoc, ...</td>\n",
+       "      <td>AGDE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>34</td>\n",
+       "      <td>6</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8738</th>\n",
+       "      <td>volume01-1103</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1103</td>\n",
+       "      <td>* AGEN, (Géog.) ancienne ville de France, capi...</td>\n",
+       "      <td>AGEN</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>28</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1678</th>\n",
+       "      <td>volume01-1210</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1210</td>\n",
+       "      <td>* AGRERE (Géog.) petite ville de France dans l...</td>\n",
+       "      <td>AGRERE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>13</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 21 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "            filename  volume  number  \\\n",
+       "4962    volume01-890       1     890   \n",
+       "3616   volume01-1065       1    1065   \n",
+       "13155  volume01-1087       1    1087   \n",
+       "8738   volume01-1103       1    1103   \n",
+       "1678   volume01-1210       1    1210   \n",
+       "\n",
+       "                                                 content headword  \\\n",
+       "4962   * ADOUR, (Géog. mod.) riviere de France qui pr...    ADOUR   \n",
+       "3616   * Afrique, (Géog. mod.) petite ville de France...  Afrique   \n",
+       "13155  * AGDE, (Géog.) ville de France en Languedoc, ...     AGDE   \n",
+       "8738   * AGEN, (Géog.) ancienne ville de France, capi...     AGEN   \n",
+       "1678   * AGRERE (Géog.) petite ville de France dans l...   AGRERE   \n",
+       "\n",
+       "                normClass   author  nb Words  nb EN  nb Name EDDA  ...  \\\n",
+       "4962   Géographie moderne  Diderot        42      4             4  ...   \n",
+       "3616   Géographie moderne  Diderot        12      4             4  ...   \n",
+       "13155          Géographie  Diderot        34      6             4  ...   \n",
+       "8738           Géographie  Diderot        28      5             4  ...   \n",
+       "1678           Géographie  Diderot        13      2             2  ...   \n",
+       "\n",
+       "       nb ENE  nb ENE Place  nb ENE Person  nb EN geocoded  \\\n",
+       "4962        2             2              0               3   \n",
+       "3616        2             2              0               3   \n",
+       "13155       3             3              0               4   \n",
+       "8738        3             3              0               3   \n",
+       "1678        1             1              0               1   \n",
+       "\n",
+       "       nb EN EDDA geocoded       type latlong  latlong value  de France  \\\n",
+       "4962                     3  hydronyme   False            NaN       True   \n",
+       "3616                     3      ville   False            NaN       True   \n",
+       "13155                    3      ville    True            NaN       True   \n",
+       "8738                     3      ville    True            NaN       True   \n",
+       "1678                     1      ville   False            NaN       True   \n",
+       "\n",
+       "       classifieur de France  \n",
+       "4962                 riviere  \n",
+       "3616                   ville  \n",
+       "13155                  ville  \n",
+       "8738                   ville  \n",
+       "1678                   ville  \n",
+       "\n",
+       "[5 rows x 21 columns]"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_diderot.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.2. Articles de géographie signés par Jaucourt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(716, 21)"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_jaucourt = data_france_cl[(data_france_cl['author'] == 'Jaucourt')]\n",
+    "data_jaucourt.shape\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.3. Articles de géographie signés par un autre auteur"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(2, 21)"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_other = data_france_cl[(data_france_cl['author'] != 'Diderot') & (data_france_cl['author'] != 'Jaucourt') & (data_france_cl['author'] != 'unsigned') ]\n",
+    "data_other.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>...</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "      <th>de France</th>\n",
+       "      <th>classifieur de France</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>9134</th>\n",
+       "      <td>volume13-126</td>\n",
+       "      <td>13</td>\n",
+       "      <td>126</td>\n",
+       "      <td>PONS, (Géog. mod.) en latin Pontes, petite vil...</td>\n",
+       "      <td>PONS</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt &amp; Jaucourt</td>\n",
+       "      <td>870</td>\n",
+       "      <td>74</td>\n",
+       "      <td>37</td>\n",
+       "      <td>...</td>\n",
+       "      <td>34</td>\n",
+       "      <td>18</td>\n",
+       "      <td>4</td>\n",
+       "      <td>18</td>\n",
+       "      <td>17</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6188</th>\n",
+       "      <td>volume13-218</td>\n",
+       "      <td>13</td>\n",
+       "      <td>218</td>\n",
+       "      <td>PONT-SUR-SEINE, (Géog. mod.) en latin moderne ...</td>\n",
+       "      <td>PONT-SUR-SEINE</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt5</td>\n",
+       "      <td>70</td>\n",
+       "      <td>9</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>2 rows × 21 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          filename  volume  number  \\\n",
+       "9134  volume13-126      13     126   \n",
+       "6188  volume13-218      13     218   \n",
+       "\n",
+       "                                                content        headword  \\\n",
+       "9134  PONS, (Géog. mod.) en latin Pontes, petite vil...            PONS   \n",
+       "6188  PONT-SUR-SEINE, (Géog. mod.) en latin moderne ...  PONT-SUR-SEINE   \n",
+       "\n",
+       "               normClass               author  nb Words  nb EN  nb Name EDDA  \\\n",
+       "9134  Géographie moderne  Jaucourt & Jaucourt       870     74            37   \n",
+       "6188  Géographie moderne            Jaucourt5        70      9             5   \n",
+       "\n",
+       "      ...  nb ENE  nb ENE Place  nb ENE Person  nb EN geocoded  \\\n",
+       "9134  ...      34            18              4              18   \n",
+       "6188  ...       4             2              0               4   \n",
+       "\n",
+       "      nb EN EDDA geocoded  type latlong  latlong value  de France  \\\n",
+       "9134                   17   NaN    True            NaN       True   \n",
+       "6188                    3   NaN    True            NaN       True   \n",
+       "\n",
+       "      classifieur de France  \n",
+       "9134                  ville  \n",
+       "6188                  ville  \n",
+       "\n",
+       "[2 rows x 21 columns]"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_other"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.4. Articles de géographie non signés"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(525, 21)"
+      ]
+     },
+     "execution_count": 27,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_unsigned = data_france_cl[(data_france_cl['author'] == 'unsigned')]\n",
+    "data_unsigned.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Echantillonnage aléatoire\n",
+    "\n",
+    "### 5.1 Calcul des quartiles"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "d_q1, d_q2, d_q3 = data_diderot['nb Words'].quantile([0.25, 0.5, 0.75])\n",
+    "j_q1, j_q2, j_q3 = data_jaucourt['nb Words'].quantile([0.25, 0.5, 0.75])\n",
+    "u_q1, u_q2, u_q3 = data_unsigned['nb Words'].quantile([0.25, 0.5, 0.75])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Diderot (172 articles) : q1 : 15.0 - q2 : 19.0 - q3 : 24.0\n",
+      "Jaucourt (716 articles) : q1 : 42.0 - q2 : 71.0 - q3 : 165.0\n",
+      "Unsigned (525 articles) : q1 : 15.0 - q2 : 21.0 - q3 : 32.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('Diderot ('+str(len(data_diderot))+' articles) : q1 : '+str(d_q1)+ ' - q2 : '+str(d_q2)+ ' - q3 : '+str(d_q3))\n",
+    "print('Jaucourt ('+str(len(data_jaucourt))+' articles) : q1 : '+str(j_q1)+ ' - q2 : '+str(j_q2)+ ' - q3 : '+str(j_q3))\n",
+    "print('Unsigned ('+str(len(data_unsigned))+' articles) : q1 : '+str(u_q1)+ ' - q2 : '+str(u_q2)+ ' - q3 : '+str(u_q3))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_diderot_q1 = data_diderot[(data_diderot['nb Words'] < d_q1)]\n",
+    "data_diderot_q2 = data_diderot[(data_diderot['nb Words'] >= d_q1) & (data_diderot['nb Words'] < d_q2)]\n",
+    "data_diderot_q3 = data_diderot[(data_diderot['nb Words'] >= d_q2) & (data_diderot['nb Words'] < d_q3)]\n",
+    "data_diderot_q4 = data_diderot[(data_diderot['nb Words'] >= d_q3)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "36 47 40 49\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(str(len(data_diderot_q1)) +\" \"+ str(len(data_diderot_q2)) +\" \"+ str(len(data_diderot_q3))+\" \"+ str(len(data_diderot_q4)))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_jaucourt_q1 = data_jaucourt[(data_jaucourt['nb Words'] < j_q1)]\n",
+    "data_jaucourt_q2 = data_jaucourt[(data_jaucourt['nb Words'] >= j_q1) & (data_jaucourt['nb Words'] < j_q2)]\n",
+    "data_jaucourt_q3 = data_jaucourt[(data_jaucourt['nb Words'] >= j_q2) & (data_jaucourt['nb Words'] < j_q3)]\n",
+    "data_jaucourt_q4 = data_jaucourt[(data_jaucourt['nb Words'] >= j_q3)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "176 178 182 180\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(str(len(data_jaucourt_q1)) +\" \"+ str(len(data_jaucourt_q2)) +\" \"+ str(len(data_jaucourt_q3))+\" \"+ str(len(data_jaucourt_q4)))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_unsigned_q1 = data_unsigned[(data_unsigned['nb Words'] <= u_q1)]\n",
+    "data_unsigned_q2 = data_unsigned[(data_unsigned['nb Words'] >= u_q1) & (data_unsigned['nb Words'] < u_q2)]\n",
+    "data_unsigned_q3 = data_unsigned[(data_unsigned['nb Words'] >= u_q2) & (data_unsigned['nb Words'] < u_q3)]\n",
+    "data_unsigned_q4 = data_unsigned[(data_unsigned['nb Words'] >= u_q3)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "143 132 134 134\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(str(len(data_unsigned_q1)) +\" \"+ str(len(data_unsigned_q2)) +\" \"+ str(len(data_unsigned_q3))+\" \"+ str(len(data_unsigned_q4)))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.2 Sélection aléatoire par sous-groupe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s_small = 4\n",
+    "s_big = 10\n",
+    "\n",
+    "sample10_diderot_q1 = data_diderot_q1.sample(10)\n",
+    "sample10_diderot_q2 = data_diderot_q2.sample(10)\n",
+    "sample10_diderot_q3 = data_diderot_q3.sample(10)\n",
+    "sample10_diderot_q4 = data_diderot_q4.sample(10)\n",
+    "\n",
+    "sample5_diderot_q1 = sample10_diderot_q1.sample(s_small)\n",
+    "sample5_diderot_q2 = sample10_diderot_q2.sample(s_small)\n",
+    "sample5_diderot_q3 = sample10_diderot_q3.sample(s_small)\n",
+    "sample5_diderot_q4 = sample10_diderot_q4.sample(s_small)\n",
+    "\n",
+    "sample10_jaucourt_q1 = data_jaucourt_q1.sample(10)\n",
+    "sample10_jaucourt_q2 = data_jaucourt_q2.sample(10)\n",
+    "sample10_jaucourt_q3 = data_jaucourt_q3.sample(10)\n",
+    "sample10_jaucourt_q4 = data_jaucourt_q4.sample(10)\n",
+    "\n",
+    "sample5_jaucourt_q1 = sample10_jaucourt_q1.sample(s_small)\n",
+    "sample5_jaucourt_q2 = sample10_jaucourt_q2.sample(s_small)\n",
+    "sample5_jaucourt_q3 = sample10_jaucourt_q3.sample(s_small)\n",
+    "sample5_jaucourt_q4 = sample10_jaucourt_q4.sample(s_small)\n",
+    "\n",
+    "sample10_unsigned_q1 = data_unsigned_q1.sample(10)\n",
+    "sample10_unsigned_q2 = data_unsigned_q2.sample(10)\n",
+    "sample10_unsigned_q3 = data_unsigned_q3.sample(10)\n",
+    "sample10_unsigned_q4 = data_unsigned_q4.sample(10)\n",
+    "\n",
+    "sample5_unsigned_q1 = sample10_unsigned_q1.sample(s_small)\n",
+    "sample5_unsigned_q2 = sample10_unsigned_q2.sample(s_small)\n",
+    "sample5_unsigned_q3 = sample10_unsigned_q3.sample(s_small)\n",
+    "sample5_unsigned_q4 = sample10_unsigned_q4.sample(s_small)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample10_diderot = pd.concat([sample10_diderot_q1,sample10_diderot_q2,sample10_diderot_q3,sample10_diderot_q4], ignore_index=True)\n",
+    "sample10_jaucourt = pd.concat([sample10_jaucourt_q1,sample10_jaucourt_q2,sample10_jaucourt_q3,sample10_jaucourt_q4], ignore_index=True)\n",
+    "sample10_unsigned = pd.concat([sample10_unsigned_q1,sample10_unsigned_q2,sample10_unsigned_q3,sample10_unsigned_q4], ignore_index=True)\n",
+    "\n",
+    "sample10 = pd.concat([sample10_diderot, sample10_jaucourt, sample10_unsigned], ignore_index=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample5_diderot = pd.concat([sample5_diderot_q1,sample5_diderot_q2,sample5_diderot_q3,sample5_diderot_q4], ignore_index=True)\n",
+    "sample5_jaucourt = pd.concat([sample5_jaucourt_q1,sample5_jaucourt_q2,sample5_jaucourt_q3,sample5_jaucourt_q4], ignore_index=True)\n",
+    "sample5_unsigned = pd.concat([sample5_unsigned_q1,sample5_unsigned_q2,sample5_unsigned_q3,sample5_unsigned_q4], ignore_index=True)\n",
+    "\n",
+    "sample5 = pd.concat([sample5_diderot, sample5_jaucourt, sample5_unsigned], ignore_index=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>...</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "      <th>de France</th>\n",
+       "      <th>classifieur de France</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>volume02-1504</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1504</td>\n",
+       "      <td>* BEIRE, (Géog.) petite ville de France, en Bo...</td>\n",
+       "      <td>BEIRE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>12</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>volume01-2599</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2599</td>\n",
+       "      <td>* ANDONVILLE, (Géog. mod.) ville de France, gé...</td>\n",
+       "      <td>ANDONVILLE</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>12</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>volume01-1065</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1065</td>\n",
+       "      <td>* Afrique, (Géog. mod.) petite ville de France...</td>\n",
+       "      <td>Afrique</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>12</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>volume02-1391</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1391</td>\n",
+       "      <td>* BEAUVOISIS ou BEAUVAISIS, (Géog.) petit pays...</td>\n",
+       "      <td>BEAUVOISIS ou BEAUVAISIS</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>13</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>pays</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>volume01-4843</td>\n",
+       "      <td>1</td>\n",
+       "      <td>4843</td>\n",
+       "      <td>* AUBETERRE (Géog.) ville de France, dans l'An...</td>\n",
+       "      <td>AUBETERRE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>17</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>volume01-3831</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3831</td>\n",
+       "      <td>* ARGENCES, (Géog.) bourg de France en basse N...</td>\n",
+       "      <td>ARGENCES</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>17</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>bourg</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>volume01-5034</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5034</td>\n",
+       "      <td>* AUNEAU (Géographie.) petite ville de France,...</td>\n",
+       "      <td>AUNEAU</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>16</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>volume01-3079</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3079</td>\n",
+       "      <td>* ANTRAIN ou ENTRAINS, (Géog. mod.) petite vil...</td>\n",
+       "      <td>ANTRAIN ou ENTRAINS</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>15</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>volume02-363</td>\n",
+       "      <td>2</td>\n",
+       "      <td>363</td>\n",
+       "      <td>* BALLON (Géog.) ville de France, au diocese d...</td>\n",
+       "      <td>BALLON</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>22</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>volume01-1279</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1279</td>\n",
+       "      <td>* Aigle, (Géog.) petite ville de France dans l...</td>\n",
+       "      <td>Aigle</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>19</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>volume02-705</td>\n",
+       "      <td>2</td>\n",
+       "      <td>705</td>\n",
+       "      <td>* BARENTON (Géog.) petite ville de France, dan...</td>\n",
+       "      <td>BARENTON</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>20</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>volume01-2810</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2810</td>\n",
+       "      <td>* ANNONAY, (Géog. mod.) petite ville de France...</td>\n",
+       "      <td>ANNONAY</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>20</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>volume02-1614</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1614</td>\n",
+       "      <td>* BENAUGE, (Géog.) petite contrée de la Guienn...</td>\n",
+       "      <td>BENAUGE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>24</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>province</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>volume01-1565</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1565</td>\n",
+       "      <td>* ALBI, (Géog.) ville de France, capitale de  ...</td>\n",
+       "      <td>ALBI</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>25</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>volume01-5144</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5144</td>\n",
+       "      <td>* AUTUN, (Géog.) ville de France au duché de B...</td>\n",
+       "      <td>AUTUN</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>27</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>volume02-1564</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1564</td>\n",
+       "      <td>* BELLE-ISLE, (Géog.) île de France à six lieu...</td>\n",
+       "      <td>BELLE-ISLE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>28</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>île</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>île</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>volume15-2668</td>\n",
+       "      <td>15</td>\n",
+       "      <td>2668</td>\n",
+       "      <td>STRENGENBACH ou STRENGBACH, le, (Géog. mod.) r...</td>\n",
+       "      <td>STRENGENBACH ou STRENGBACH, le</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>30</td>\n",
+       "      <td>6</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>hydronyme</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>riviere</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>volume15-4368</td>\n",
+       "      <td>15</td>\n",
+       "      <td>4368</td>\n",
+       "      <td>TARDÉNOIS, le (Géog. mod.) en latin du moyen â...</td>\n",
+       "      <td>TARDÉNOIS, le</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>34</td>\n",
+       "      <td>7</td>\n",
+       "      <td>6</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>pays</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>volume07-2017</td>\n",
+       "      <td>7</td>\n",
+       "      <td>2017</td>\n",
+       "      <td>Germain-Laval, (Saint-) Géog. ville de France ...</td>\n",
+       "      <td>Germain-Laval, (Saint-)</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>38</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>volume16-4274</td>\n",
+       "      <td>16</td>\n",
+       "      <td>4274</td>\n",
+       "      <td>Valence, (Géog. mod.) nos géographes disent pe...</td>\n",
+       "      <td>Valence</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>32</td>\n",
+       "      <td>5</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20</th>\n",
+       "      <td>volume10-381</td>\n",
+       "      <td>10</td>\n",
+       "      <td>381</td>\n",
+       "      <td>MARCELLIN, S. (Géog.) petite ville de France e...</td>\n",
+       "      <td>MARCELLIN</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>55</td>\n",
+       "      <td>9</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>21</th>\n",
+       "      <td>volume13-514</td>\n",
+       "      <td>13</td>\n",
+       "      <td>514</td>\n",
+       "      <td>PORTO-CROS, (Géog. mod.) petite île de France ...</td>\n",
+       "      <td>PORTO-CROS</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>48</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>île</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>île</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22</th>\n",
+       "      <td>volume12-3457</td>\n",
+       "      <td>12</td>\n",
+       "      <td>3457</td>\n",
+       "      <td>PLOERMEL, (Géog. mod.) petite ville de France ...</td>\n",
+       "      <td>PLOERMEL</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>46</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>23</th>\n",
+       "      <td>volume14-2693</td>\n",
+       "      <td>14</td>\n",
+       "      <td>2693</td>\n",
+       "      <td>RUFFEC, (Géog. mod.) petite ville de France, d...</td>\n",
+       "      <td>RUFFEC</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>47</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24</th>\n",
+       "      <td>volume17-1439</td>\n",
+       "      <td>17</td>\n",
+       "      <td>1439</td>\n",
+       "      <td>VODABLE, (Géog. mod.) bourg de France dans l'A...</td>\n",
+       "      <td>VODABLE</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>76</td>\n",
+       "      <td>8</td>\n",
+       "      <td>7</td>\n",
+       "      <td>...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>6</td>\n",
+       "      <td>6</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>bourg</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25</th>\n",
+       "      <td>volume13-208</td>\n",
+       "      <td>13</td>\n",
+       "      <td>208</td>\n",
+       "      <td>PONTIVY, (Géog. mod.) petite ville de France, ...</td>\n",
+       "      <td>PONTIVY</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>145</td>\n",
+       "      <td>15</td>\n",
+       "      <td>10</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>11</td>\n",
+       "      <td>9</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>26</th>\n",
+       "      <td>volume09-524</td>\n",
+       "      <td>9</td>\n",
+       "      <td>524</td>\n",
+       "      <td>KAYSERBERG, (Géog.) c'est-à-dire mont de l'emp...</td>\n",
+       "      <td>KAYSERBERG</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>130</td>\n",
+       "      <td>17</td>\n",
+       "      <td>7</td>\n",
+       "      <td>...</td>\n",
+       "      <td>7</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>6</td>\n",
+       "      <td>4</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>27</th>\n",
+       "      <td>volume11-1060</td>\n",
+       "      <td>11</td>\n",
+       "      <td>1060</td>\n",
+       "      <td>Nogent-le-Rotrou, (Géog.) gros bourg de France...</td>\n",
+       "      <td>Nogent-le-Rotrou</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>123</td>\n",
+       "      <td>17</td>\n",
+       "      <td>7</td>\n",
+       "      <td>...</td>\n",
+       "      <td>9</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>bourg</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>28</th>\n",
+       "      <td>volume12-2242</td>\n",
+       "      <td>12</td>\n",
+       "      <td>2242</td>\n",
+       "      <td>PICARDIE, la, (Géog. mod.) province de France,...</td>\n",
+       "      <td>PICARDIE, la</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>432</td>\n",
+       "      <td>60</td>\n",
+       "      <td>35</td>\n",
+       "      <td>...</td>\n",
+       "      <td>14</td>\n",
+       "      <td>5</td>\n",
+       "      <td>3</td>\n",
+       "      <td>16</td>\n",
+       "      <td>11</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>province</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>29</th>\n",
+       "      <td>volume11-3735</td>\n",
+       "      <td>11</td>\n",
+       "      <td>3735</td>\n",
+       "      <td>Palais, (Géograph. mod.) petite place forte de...</td>\n",
+       "      <td>Palais</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>515</td>\n",
+       "      <td>42</td>\n",
+       "      <td>17</td>\n",
+       "      <td>...</td>\n",
+       "      <td>16</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2</td>\n",
+       "      <td>8</td>\n",
+       "      <td>7</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>place</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>30</th>\n",
+       "      <td>volume14-2568</td>\n",
+       "      <td>14</td>\n",
+       "      <td>2568</td>\n",
+       "      <td>ROUSSILLON, le, (Géog. mod.) en latin Ruscinon...</td>\n",
+       "      <td>ROUSSILLON, le</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>554</td>\n",
+       "      <td>54</td>\n",
+       "      <td>35</td>\n",
+       "      <td>...</td>\n",
+       "      <td>22</td>\n",
+       "      <td>9</td>\n",
+       "      <td>3</td>\n",
+       "      <td>16</td>\n",
+       "      <td>14</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>province</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>31</th>\n",
+       "      <td>volume17-636</td>\n",
+       "      <td>17</td>\n",
+       "      <td>636</td>\n",
+       "      <td>Vic-le-comte, (Géog. mod.) petite ville de Fra...</td>\n",
+       "      <td>Vic-le-comte</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>274</td>\n",
+       "      <td>19</td>\n",
+       "      <td>9</td>\n",
+       "      <td>...</td>\n",
+       "      <td>9</td>\n",
+       "      <td>3</td>\n",
+       "      <td>1</td>\n",
+       "      <td>8</td>\n",
+       "      <td>4</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>32</th>\n",
+       "      <td>volume02-6579</td>\n",
+       "      <td>2</td>\n",
+       "      <td>6579</td>\n",
+       "      <td>CERNIN, (Saint) Géog. petite ville de France, ...</td>\n",
+       "      <td>CERNIN (Saint)</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>10</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>33</th>\n",
+       "      <td>volume02-4075</td>\n",
+       "      <td>2</td>\n",
+       "      <td>4075</td>\n",
+       "      <td>Bruges, (Géog.) petite ville de France, dans l...</td>\n",
+       "      <td>Bruges</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>14</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>34</th>\n",
+       "      <td>volume02-4564</td>\n",
+       "      <td>2</td>\n",
+       "      <td>4564</td>\n",
+       "      <td>CADENAC, (Géog.) petite ville de France dans l...</td>\n",
+       "      <td>CADENAC</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>14</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>35</th>\n",
+       "      <td>volume02-6305</td>\n",
+       "      <td>2</td>\n",
+       "      <td>6305</td>\n",
+       "      <td>CAYLAR, (le) Géog. petite ville de France, dan...</td>\n",
+       "      <td>CAYLAR (le)</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>12</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>36</th>\n",
+       "      <td>volume03-3769</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3769</td>\n",
+       "      <td>CONDOM, (Géog. mod.) ville de France en Gascog...</td>\n",
+       "      <td>CONDOM</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>19</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>37</th>\n",
+       "      <td>volume17-2104</td>\n",
+       "      <td>17</td>\n",
+       "      <td>2104</td>\n",
+       "      <td>WASSELONNE, (Géog. mod.) bourg ou petite ville...</td>\n",
+       "      <td>WASSELONNE</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>19</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>bourg</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>38</th>\n",
+       "      <td>volume04-2909</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2909</td>\n",
+       "      <td>CUSSET, (Géog. mod.) petite ville de France en...</td>\n",
+       "      <td>CUSSET</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>15</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>39</th>\n",
+       "      <td>volume02-3396</td>\n",
+       "      <td>2</td>\n",
+       "      <td>3396</td>\n",
+       "      <td>BOUTONNE, (Géog.) riviere de France, qui prend...</td>\n",
+       "      <td>BOUTONNE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>18</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>hydronyme</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>riviere</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>40</th>\n",
+       "      <td>volume03-2391</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2391</td>\n",
+       "      <td>Clermont, (Géog. mod.) petite ville de France,...</td>\n",
+       "      <td>Clermont</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>28</td>\n",
+       "      <td>7</td>\n",
+       "      <td>5</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>41</th>\n",
+       "      <td>volume02-3250</td>\n",
+       "      <td>2</td>\n",
+       "      <td>3250</td>\n",
+       "      <td>Bourg-en-Bresse, (Géog.) ville de France, capi...</td>\n",
+       "      <td>Bourg-en-Bresse</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>27</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>42</th>\n",
+       "      <td>volume09-2586</td>\n",
+       "      <td>9</td>\n",
+       "      <td>2586</td>\n",
+       "      <td>LIMOURS, (Géog.) petite ville de France dans l...</td>\n",
+       "      <td>LIMOURS</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>26</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>43</th>\n",
+       "      <td>volume17-575</td>\n",
+       "      <td>17</td>\n",
+       "      <td>575</td>\n",
+       "      <td>VEUDRE, (Géog. mod.) petite ville ou bourg de ...</td>\n",
+       "      <td>VEUDRE</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>23</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>44</th>\n",
+       "      <td>volume10-3018</td>\n",
+       "      <td>10</td>\n",
+       "      <td>3018</td>\n",
+       "      <td>MONT-TRICHARD, (Géog.) ancienne petite ville d...</td>\n",
+       "      <td>MONT-TRICHARD</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>42</td>\n",
+       "      <td>7</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>45</th>\n",
+       "      <td>volume14-4758</td>\n",
+       "      <td>14</td>\n",
+       "      <td>4758</td>\n",
+       "      <td>SECLIN, (Géog. mod.) en latin moderne Sacilium...</td>\n",
+       "      <td>SECLIN</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>46</td>\n",
+       "      <td>7</td>\n",
+       "      <td>3</td>\n",
+       "      <td>...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>bourg</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>46</th>\n",
+       "      <td>volume10-1805</td>\n",
+       "      <td>10</td>\n",
+       "      <td>1805</td>\n",
+       "      <td>MERY-SUR-SEINE, (Géog.) petite ville de France...</td>\n",
+       "      <td>MERY-SUR-SEINE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>36</td>\n",
+       "      <td>6</td>\n",
+       "      <td>4</td>\n",
+       "      <td>...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>ville</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>47</th>\n",
+       "      <td>volume11-3255</td>\n",
+       "      <td>11</td>\n",
+       "      <td>3255</td>\n",
+       "      <td>OUESSANT, (Géog. mod.) île de France dans l'Oc...</td>\n",
+       "      <td>OUESSANT</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>420</td>\n",
+       "      <td>8</td>\n",
+       "      <td>6</td>\n",
+       "      <td>...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>île</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>île</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>48 rows × 21 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "         filename  volume  number  \\\n",
+       "0   volume02-1504       2    1504   \n",
+       "1   volume01-2599       1    2599   \n",
+       "2   volume01-1065       1    1065   \n",
+       "3   volume02-1391       2    1391   \n",
+       "4   volume01-4843       1    4843   \n",
+       "5   volume01-3831       1    3831   \n",
+       "6   volume01-5034       1    5034   \n",
+       "7   volume01-3079       1    3079   \n",
+       "8    volume02-363       2     363   \n",
+       "9   volume01-1279       1    1279   \n",
+       "10   volume02-705       2     705   \n",
+       "11  volume01-2810       1    2810   \n",
+       "12  volume02-1614       2    1614   \n",
+       "13  volume01-1565       1    1565   \n",
+       "14  volume01-5144       1    5144   \n",
+       "15  volume02-1564       2    1564   \n",
+       "16  volume15-2668      15    2668   \n",
+       "17  volume15-4368      15    4368   \n",
+       "18  volume07-2017       7    2017   \n",
+       "19  volume16-4274      16    4274   \n",
+       "20   volume10-381      10     381   \n",
+       "21   volume13-514      13     514   \n",
+       "22  volume12-3457      12    3457   \n",
+       "23  volume14-2693      14    2693   \n",
+       "24  volume17-1439      17    1439   \n",
+       "25   volume13-208      13     208   \n",
+       "26   volume09-524       9     524   \n",
+       "27  volume11-1060      11    1060   \n",
+       "28  volume12-2242      12    2242   \n",
+       "29  volume11-3735      11    3735   \n",
+       "30  volume14-2568      14    2568   \n",
+       "31   volume17-636      17     636   \n",
+       "32  volume02-6579       2    6579   \n",
+       "33  volume02-4075       2    4075   \n",
+       "34  volume02-4564       2    4564   \n",
+       "35  volume02-6305       2    6305   \n",
+       "36  volume03-3769       3    3769   \n",
+       "37  volume17-2104      17    2104   \n",
+       "38  volume04-2909       4    2909   \n",
+       "39  volume02-3396       2    3396   \n",
+       "40  volume03-2391       3    2391   \n",
+       "41  volume02-3250       2    3250   \n",
+       "42  volume09-2586       9    2586   \n",
+       "43   volume17-575      17     575   \n",
+       "44  volume10-3018      10    3018   \n",
+       "45  volume14-4758      14    4758   \n",
+       "46  volume10-1805      10    1805   \n",
+       "47  volume11-3255      11    3255   \n",
+       "\n",
+       "                                              content  \\\n",
+       "0   * BEIRE, (Géog.) petite ville de France, en Bo...   \n",
+       "1   * ANDONVILLE, (Géog. mod.) ville de France, gé...   \n",
+       "2   * Afrique, (Géog. mod.) petite ville de France...   \n",
+       "3   * BEAUVOISIS ou BEAUVAISIS, (Géog.) petit pays...   \n",
+       "4   * AUBETERRE (Géog.) ville de France, dans l'An...   \n",
+       "5   * ARGENCES, (Géog.) bourg de France en basse N...   \n",
+       "6   * AUNEAU (Géographie.) petite ville de France,...   \n",
+       "7   * ANTRAIN ou ENTRAINS, (Géog. mod.) petite vil...   \n",
+       "8   * BALLON (Géog.) ville de France, au diocese d...   \n",
+       "9   * Aigle, (Géog.) petite ville de France dans l...   \n",
+       "10  * BARENTON (Géog.) petite ville de France, dan...   \n",
+       "11  * ANNONAY, (Géog. mod.) petite ville de France...   \n",
+       "12  * BENAUGE, (Géog.) petite contrée de la Guienn...   \n",
+       "13  * ALBI, (Géog.) ville de France, capitale de  ...   \n",
+       "14  * AUTUN, (Géog.) ville de France au duché de B...   \n",
+       "15  * BELLE-ISLE, (Géog.) île de France à six lieu...   \n",
+       "16  STRENGENBACH ou STRENGBACH, le, (Géog. mod.) r...   \n",
+       "17  TARDÉNOIS, le (Géog. mod.) en latin du moyen â...   \n",
+       "18  Germain-Laval, (Saint-) Géog. ville de France ...   \n",
+       "19  Valence, (Géog. mod.) nos géographes disent pe...   \n",
+       "20  MARCELLIN, S. (Géog.) petite ville de France e...   \n",
+       "21  PORTO-CROS, (Géog. mod.) petite île de France ...   \n",
+       "22  PLOERMEL, (Géog. mod.) petite ville de France ...   \n",
+       "23  RUFFEC, (Géog. mod.) petite ville de France, d...   \n",
+       "24  VODABLE, (Géog. mod.) bourg de France dans l'A...   \n",
+       "25  PONTIVY, (Géog. mod.) petite ville de France, ...   \n",
+       "26  KAYSERBERG, (Géog.) c'est-à-dire mont de l'emp...   \n",
+       "27  Nogent-le-Rotrou, (Géog.) gros bourg de France...   \n",
+       "28  PICARDIE, la, (Géog. mod.) province de France,...   \n",
+       "29  Palais, (Géograph. mod.) petite place forte de...   \n",
+       "30  ROUSSILLON, le, (Géog. mod.) en latin Ruscinon...   \n",
+       "31  Vic-le-comte, (Géog. mod.) petite ville de Fra...   \n",
+       "32  CERNIN, (Saint) Géog. petite ville de France, ...   \n",
+       "33  Bruges, (Géog.) petite ville de France, dans l...   \n",
+       "34  CADENAC, (Géog.) petite ville de France dans l...   \n",
+       "35  CAYLAR, (le) Géog. petite ville de France, dan...   \n",
+       "36  CONDOM, (Géog. mod.) ville de France en Gascog...   \n",
+       "37  WASSELONNE, (Géog. mod.) bourg ou petite ville...   \n",
+       "38  CUSSET, (Géog. mod.) petite ville de France en...   \n",
+       "39  BOUTONNE, (Géog.) riviere de France, qui prend...   \n",
+       "40  Clermont, (Géog. mod.) petite ville de France,...   \n",
+       "41  Bourg-en-Bresse, (Géog.) ville de France, capi...   \n",
+       "42  LIMOURS, (Géog.) petite ville de France dans l...   \n",
+       "43  VEUDRE, (Géog. mod.) petite ville ou bourg de ...   \n",
+       "44  MONT-TRICHARD, (Géog.) ancienne petite ville d...   \n",
+       "45  SECLIN, (Géog. mod.) en latin moderne Sacilium...   \n",
+       "46  MERY-SUR-SEINE, (Géog.) petite ville de France...   \n",
+       "47  OUESSANT, (Géog. mod.) île de France dans l'Oc...   \n",
+       "\n",
+       "                          headword           normClass    author  nb Words  \\\n",
+       "0                            BEIRE          Géographie   Diderot        12   \n",
+       "1                       ANDONVILLE  Géographie moderne   Diderot        12   \n",
+       "2                          Afrique  Géographie moderne   Diderot        12   \n",
+       "3         BEAUVOISIS ou BEAUVAISIS          Géographie   Diderot        13   \n",
+       "4                        AUBETERRE          Géographie   Diderot        17   \n",
+       "5                         ARGENCES          Géographie   Diderot        17   \n",
+       "6                           AUNEAU          Géographie   Diderot        16   \n",
+       "7              ANTRAIN ou ENTRAINS  Géographie moderne   Diderot        15   \n",
+       "8                           BALLON          Géographie   Diderot        22   \n",
+       "9                            Aigle          Géographie   Diderot        19   \n",
+       "10                        BARENTON          Géographie   Diderot        20   \n",
+       "11                         ANNONAY  Géographie moderne   Diderot        20   \n",
+       "12                         BENAUGE          Géographie   Diderot        24   \n",
+       "13                            ALBI          Géographie   Diderot        25   \n",
+       "14                           AUTUN          Géographie   Diderot        27   \n",
+       "15                      BELLE-ISLE          Géographie   Diderot        28   \n",
+       "16  STRENGENBACH ou STRENGBACH, le  Géographie moderne  Jaucourt        30   \n",
+       "17                   TARDÉNOIS, le  Géographie moderne  Jaucourt        34   \n",
+       "18         Germain-Laval, (Saint-)          Géographie  Jaucourt        38   \n",
+       "19                         Valence  Géographie moderne  Jaucourt        32   \n",
+       "20                       MARCELLIN          Géographie  Jaucourt        55   \n",
+       "21                      PORTO-CROS  Géographie moderne  Jaucourt        48   \n",
+       "22                        PLOERMEL  Géographie moderne  Jaucourt        46   \n",
+       "23                          RUFFEC  Géographie moderne  Jaucourt        47   \n",
+       "24                         VODABLE  Géographie moderne  Jaucourt        76   \n",
+       "25                         PONTIVY  Géographie moderne  Jaucourt       145   \n",
+       "26                      KAYSERBERG          Géographie  Jaucourt       130   \n",
+       "27                Nogent-le-Rotrou          Géographie  Jaucourt       123   \n",
+       "28                    PICARDIE, la  Géographie moderne  Jaucourt       432   \n",
+       "29                          Palais  Géographie moderne  Jaucourt       515   \n",
+       "30                  ROUSSILLON, le  Géographie moderne  Jaucourt       554   \n",
+       "31                    Vic-le-comte  Géographie moderne  Jaucourt       274   \n",
+       "32                  CERNIN (Saint)          Géographie  unsigned        10   \n",
+       "33                          Bruges          Géographie  unsigned        14   \n",
+       "34                         CADENAC          Géographie  unsigned        14   \n",
+       "35                     CAYLAR (le)          Géographie  unsigned        12   \n",
+       "36                          CONDOM  Géographie moderne  unsigned        19   \n",
+       "37                      WASSELONNE  Géographie moderne  unsigned        19   \n",
+       "38                          CUSSET  Géographie moderne  unsigned        15   \n",
+       "39                        BOUTONNE          Géographie  unsigned        18   \n",
+       "40                        Clermont  Géographie moderne  unsigned        28   \n",
+       "41                 Bourg-en-Bresse          Géographie  unsigned        27   \n",
+       "42                         LIMOURS          Géographie  unsigned        26   \n",
+       "43                          VEUDRE  Géographie moderne  unsigned        23   \n",
+       "44                   MONT-TRICHARD          Géographie  unsigned        42   \n",
+       "45                          SECLIN  Géographie moderne  unsigned        46   \n",
+       "46                  MERY-SUR-SEINE          Géographie  unsigned        36   \n",
+       "47                        OUESSANT  Géographie moderne  unsigned       420   \n",
+       "\n",
+       "    nb EN  nb Name EDDA  ...  nb ENE  nb ENE Place  nb ENE Person  \\\n",
+       "0       4             4  ...       2             2              0   \n",
+       "1       4             4  ...       3             2              0   \n",
+       "2       4             4  ...       2             2              0   \n",
+       "3       4             4  ...       1             1              0   \n",
+       "4       4             2  ...       1             1              0   \n",
+       "5       3             3  ...       1             1              0   \n",
+       "6       4             4  ...       2             2              0   \n",
+       "7       5             5  ...       2             2              0   \n",
+       "8       4             4  ...       3             2              0   \n",
+       "9       5             5  ...       2             2              0   \n",
+       "10      5             5  ...       3             3              0   \n",
+       "11      3             2  ...       1             1              0   \n",
+       "12      5             5  ...       4             4              0   \n",
+       "13      4             4  ...       1             1              0   \n",
+       "14      4             3  ...       2             2              0   \n",
+       "15      4             3  ...       3             3              0   \n",
+       "16      6             3  ...       1             1              0   \n",
+       "17      7             6  ...       2             1              1   \n",
+       "18      4             2  ...       2             2              0   \n",
+       "19      5             3  ...       2             2              0   \n",
+       "20      9             5  ...       3             2              1   \n",
+       "21      5             4  ...       3             2              0   \n",
+       "22      5             4  ...       2             2              0   \n",
+       "23      4             3  ...       1             1              0   \n",
+       "24      8             7  ...       4             3              0   \n",
+       "25     15            10  ...       3             3              0   \n",
+       "26     17             7  ...       7             2              2   \n",
+       "27     17             7  ...       9             2              2   \n",
+       "28     60            35  ...      14             5              3   \n",
+       "29     42            17  ...      16             6              2   \n",
+       "30     54            35  ...      22             9              3   \n",
+       "31     19             9  ...       9             3              1   \n",
+       "32      3             3  ...       1             1              0   \n",
+       "33      4             2  ...       2             1              0   \n",
+       "34      4             3  ...       2             2              0   \n",
+       "35      3             3  ...       2             2              0   \n",
+       "36      5             4  ...       2             2              0   \n",
+       "37      4             2  ...       1             1              0   \n",
+       "38      3             3  ...       1             1              0   \n",
+       "39      4             3  ...       1             1              0   \n",
+       "40      7             5  ...       1             1              0   \n",
+       "41      4             2  ...       3             3              0   \n",
+       "42      5             4  ...       3             2              1   \n",
+       "43      4             4  ...       1             1              0   \n",
+       "44      7             3  ...       2             1              1   \n",
+       "45      7             3  ...       4             2              0   \n",
+       "46      6             4  ...       2             1              0   \n",
+       "47      8             6  ...       4             3              0   \n",
+       "\n",
+       "    nb EN geocoded  nb EN EDDA geocoded       type latlong  latlong value  \\\n",
+       "0                2                    2      ville   False            NaN   \n",
+       "1                3                    3      ville   False            NaN   \n",
+       "2                3                    3      ville   False            NaN   \n",
+       "3                2                    2       pays   False            NaN   \n",
+       "4                3                    2      ville    True            NaN   \n",
+       "5                2                    2      ville    True            NaN   \n",
+       "6                4                    4      ville   False            NaN   \n",
+       "7                3                    3      ville   False            NaN   \n",
+       "8                1                    1      ville    True            NaN   \n",
+       "9                4                    4      ville   False            NaN   \n",
+       "10               5                    5      ville   False            NaN   \n",
+       "11               2                    2      ville    True            NaN   \n",
+       "12               3                    3       pays   False            NaN   \n",
+       "13               2                    2      ville    True            NaN   \n",
+       "14               2                    2      ville    True            NaN   \n",
+       "15               3                    3        île   False            NaN   \n",
+       "16               4                    2  hydronyme   False            NaN   \n",
+       "17               2                    2        NaN   False            NaN   \n",
+       "18               2                    2      ville    True            NaN   \n",
+       "19               2                    2        NaN   False            NaN   \n",
+       "20               2                    2      ville    True            NaN   \n",
+       "21               3                    2        île   False            NaN   \n",
+       "22               4                    4      ville    True            NaN   \n",
+       "23               2                    2      ville    True            NaN   \n",
+       "24               6                    6      ville    True            NaN   \n",
+       "25              11                    9      ville    True            NaN   \n",
+       "26               6                    4        NaN    True            NaN   \n",
+       "27               4                    3      ville    True            NaN   \n",
+       "28              16                   11       pays   False            NaN   \n",
+       "29               8                    7        NaN    True            NaN   \n",
+       "30              16                   14        NaN   False            NaN   \n",
+       "31               8                    4      ville    True            NaN   \n",
+       "32               2                    2      ville   False            NaN   \n",
+       "33               3                    2      ville   False            NaN   \n",
+       "34               1                    1      ville   False            NaN   \n",
+       "35               1                    1      ville   False            NaN   \n",
+       "36               3                    2      ville    True            NaN   \n",
+       "37               3                    2      ville   False            NaN   \n",
+       "38               2                    2      ville    True            NaN   \n",
+       "39               2                    2  hydronyme   False            NaN   \n",
+       "40               4                    3      ville   False            NaN   \n",
+       "41               3                    2      ville    True            NaN   \n",
+       "42               3                    3      ville    True            NaN   \n",
+       "43               1                    1      ville   False            NaN   \n",
+       "44               2                    2        NaN    True            NaN   \n",
+       "45               4                    2        NaN   False            NaN   \n",
+       "46               4                    3      ville    True            NaN   \n",
+       "47               5                    5        île    True            NaN   \n",
+       "\n",
+       "    de France  classifieur de France  \n",
+       "0        True                  ville  \n",
+       "1        True                  ville  \n",
+       "2        True                  ville  \n",
+       "3        True                   pays  \n",
+       "4        True                  ville  \n",
+       "5        True                  bourg  \n",
+       "6        True                  ville  \n",
+       "7        True                  ville  \n",
+       "8        True                  ville  \n",
+       "9        True                  ville  \n",
+       "10       True                  ville  \n",
+       "11       True                  ville  \n",
+       "12       True               province  \n",
+       "13       True                  ville  \n",
+       "14       True                  ville  \n",
+       "15       True                    île  \n",
+       "16       True                riviere  \n",
+       "17       True                   pays  \n",
+       "18       True                  ville  \n",
+       "19       True                  ville  \n",
+       "20       True                  ville  \n",
+       "21       True                    île  \n",
+       "22       True                  ville  \n",
+       "23       True                  ville  \n",
+       "24       True                  bourg  \n",
+       "25       True                  ville  \n",
+       "26       True                  ville  \n",
+       "27       True                  bourg  \n",
+       "28       True               province  \n",
+       "29       True                  place  \n",
+       "30       True               province  \n",
+       "31       True                  ville  \n",
+       "32       True                  ville  \n",
+       "33       True                  ville  \n",
+       "34       True                  ville  \n",
+       "35       True                  ville  \n",
+       "36       True                  ville  \n",
+       "37       True                  bourg  \n",
+       "38       True                  ville  \n",
+       "39       True                riviere  \n",
+       "40       True                  ville  \n",
+       "41       True                  ville  \n",
+       "42       True                  ville  \n",
+       "43       True                  ville  \n",
+       "44       True                  ville  \n",
+       "45       True                  bourg  \n",
+       "46       True                  ville  \n",
+       "47       True                    île  \n",
+       "\n",
+       "[48 rows x 21 columns]"
+      ]
+     },
+     "execution_count": 39,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample5"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.3 Enregistrement des résultats"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample5.to_csv('../Data/FranceGEOArticles-Sample4-23.08.17.tsv', sep='\\t', index=False)\n",
+    "#sample10.to_csv('../Data/FranceGEOArticles-Sample10-21.08.17.tsv', sep='\\t', index=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Liste des 100 articles de géographie les plus longs de Diderot"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "domaines_geographie = ['Géographie', 'Géographie moderne',\n",
+    "                 'Géographie ancienne', 'Géographie moderne | Géographie ancienne',\n",
+    "                 'Géographie ancienne | Géographie moderne', 'Géographie sacrée', 'Géographie sainte',\n",
+    "                 'Géographie | Histoire ancienne', 'Géographie historique', 'Géographie | Histoire',\n",
+    "                 'Histoire | Géographie', 'Géographie | Histoire naturelle', 'Géographie | Mythologie',\n",
+    "                 'Géographie ancienne | Mythologie', 'Histoire moderne | Géographie',\n",
+    "                 'Géographie ancienne | Géographie sainte', 'Géographie ancienne | Géographie sacrée',\n",
+    "                 'Géographie sacrée | Géographie ancienne', 'Géographie du moyen âge', 'Géographie des Arabes',\n",
+    "                 'Géographie | Commerce', 'Histoire | Géographie ancienne',\n",
+    "                 'Géographie | Histoire ancienne | Histoire moderne', 'Géographie ancienne | Littérature | Histoire',\n",
+    "                 'Histoire naturelle | Géographie', 'Géographie | Histoire ancienne | Mythologie',\n",
+    "                 'Géographie moderne | Commerce', 'Géographie ancienne | Géographie antique',\n",
+    "                 'Géographie moderne | Histoire', 'Géographie | Histoire monastique',\n",
+    "                 'Géographie ancienne | Géographie moderne | Mythologie', 'Géographie ancienne | Histoire',\n",
+    "                 'Géographie ancienne | Littérature | Mythologie', 'Géographie ancienne | Médailles'\n",
+    "                 ]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "boolean_series = data.normClass.isin(domaines_geographie)\n",
+    "filtered_df = data[boolean_series]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(14452, 19)"
+      ]
+     },
+     "execution_count": 57,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "filtered_df.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1578</th>\n",
+       "      <td>volume01-350</td>\n",
+       "      <td>1</td>\n",
+       "      <td>350</td>\n",
+       "      <td>* Abyde, (Géog. anc.) ville d'Egypte.</td>\n",
+       "      <td>Abyde</td>\n",
+       "      <td>Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12894</th>\n",
+       "      <td>volume01-539</td>\n",
+       "      <td>1</td>\n",
+       "      <td>539</td>\n",
+       "      <td>ACÉ, s. f. (Geog. anc.) ville de Phénicie. Voy...</td>\n",
+       "      <td>ACÉ</td>\n",
+       "      <td>Géographie ancienne</td>\n",
+       "      <td>unsigned</td>\n",
+       "      <td>10</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>20659</th>\n",
+       "      <td>volume01-561</td>\n",
+       "      <td>1</td>\n",
+       "      <td>561</td>\n",
+       "      <td>* ACHAIE, s. m. (Geog. anc.) C'est le nom d'un...</td>\n",
+       "      <td>ACHAIE</td>\n",
+       "      <td>Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>46</td>\n",
+       "      <td>7</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17606</th>\n",
+       "      <td>volume01-581</td>\n",
+       "      <td>1</td>\n",
+       "      <td>581</td>\n",
+       "      <td>* ACHERON, s. m. (Géog. anc. &amp; Myth.) C'étoit ...</td>\n",
+       "      <td>ACHERON</td>\n",
+       "      <td>Géographie ancienne | Mythologie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>49</td>\n",
+       "      <td>7</td>\n",
+       "      <td>6</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2407</th>\n",
+       "      <td>volume01-582</td>\n",
+       "      <td>1</td>\n",
+       "      <td>582</td>\n",
+       "      <td>* ACHERUSE, s. f. (Géog. Hist. anc. &amp; Myth.) l...</td>\n",
+       "      <td>ACHERUSE</td>\n",
+       "      <td>Géographie | Histoire ancienne | Mythologie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>112</td>\n",
+       "      <td>7</td>\n",
+       "      <td>4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>hydronyme</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "           filename  volume  number  \\\n",
+       "1578   volume01-350       1     350   \n",
+       "12894  volume01-539       1     539   \n",
+       "20659  volume01-561       1     561   \n",
+       "17606  volume01-581       1     581   \n",
+       "2407   volume01-582       1     582   \n",
+       "\n",
+       "                                                 content  headword  \\\n",
+       "1578               * Abyde, (Géog. anc.) ville d'Egypte.     Abyde   \n",
+       "12894  ACÉ, s. f. (Geog. anc.) ville de Phénicie. Voy...       ACÉ   \n",
+       "20659  * ACHAIE, s. m. (Geog. anc.) C'est le nom d'un...    ACHAIE   \n",
+       "17606  * ACHERON, s. m. (Géog. anc. & Myth.) C'étoit ...   ACHERON   \n",
+       "2407   * ACHERUSE, s. f. (Géog. Hist. anc. & Myth.) l...  ACHERUSE   \n",
+       "\n",
+       "                                         normClass    author  nb Words  nb EN  \\\n",
+       "1578                           Géographie ancienne   Diderot         6      2   \n",
+       "12894                          Géographie ancienne  unsigned        10      2   \n",
+       "20659                          Géographie ancienne   Diderot        46      7   \n",
+       "17606             Géographie ancienne | Mythologie   Diderot        49      7   \n",
+       "2407   Géographie | Histoire ancienne | Mythologie   Diderot       112      7   \n",
+       "\n",
+       "       nb Name EDDA  nb Person  nb ENE  nb ENE Place  nb ENE Person  \\\n",
+       "1578              2          0       1             1              0   \n",
+       "12894             1          0       1             1              0   \n",
+       "20659             5          0       3             3              0   \n",
+       "17606             6          0       3             3              0   \n",
+       "2407              4          1       1             1              0   \n",
+       "\n",
+       "       nb EN geocoded  nb EN EDDA geocoded       type  latlong  latlong value  \n",
+       "1578                0                    0      ville    False            NaN  \n",
+       "12894               0                    0      ville    False            NaN  \n",
+       "20659               0                    0        NaN    False            NaN  \n",
+       "17606               0                    0        NaN    False            NaN  \n",
+       "2407                0                    0  hydronyme    False            NaN  "
+      ]
+     },
+     "execution_count": 58,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "filtered_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_diderot = filtered_df[(filtered_df['author'] == 'Diderot')]\n",
+    "data_jaucourt = filtered_df[(filtered_df['author'] == 'Jaucourt')]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(1250, 19)\n",
+      "(8287, 19)\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(data_diderot.shape)\n",
+    "print(data_jaucourt.shape)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>913</th>\n",
+       "      <td>volume01-350</td>\n",
+       "      <td>1</td>\n",
+       "      <td>350</td>\n",
+       "      <td>* Abyde, (Géog. anc.) ville d'Egypte.</td>\n",
+       "      <td>Abyde</td>\n",
+       "      <td>Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11935</th>\n",
+       "      <td>volume01-561</td>\n",
+       "      <td>1</td>\n",
+       "      <td>561</td>\n",
+       "      <td>* ACHAIE, s. m. (Geog. anc.) C'est le nom d'un...</td>\n",
+       "      <td>ACHAIE</td>\n",
+       "      <td>Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>46</td>\n",
+       "      <td>7</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10142</th>\n",
+       "      <td>volume01-581</td>\n",
+       "      <td>1</td>\n",
+       "      <td>581</td>\n",
+       "      <td>* ACHERON, s. m. (Géog. anc. &amp; Myth.) C'étoit ...</td>\n",
+       "      <td>ACHERON</td>\n",
+       "      <td>Géographie ancienne | Mythologie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>49</td>\n",
+       "      <td>7</td>\n",
+       "      <td>6</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1400</th>\n",
+       "      <td>volume01-582</td>\n",
+       "      <td>1</td>\n",
+       "      <td>582</td>\n",
+       "      <td>* ACHERUSE, s. f. (Géog. Hist. anc. &amp; Myth.) l...</td>\n",
+       "      <td>ACHERUSE</td>\n",
+       "      <td>Géographie | Histoire ancienne | Mythologie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>112</td>\n",
+       "      <td>7</td>\n",
+       "      <td>4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>hydronyme</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14193</th>\n",
+       "      <td>volume01-591</td>\n",
+       "      <td>1</td>\n",
+       "      <td>591</td>\n",
+       "      <td>* ACHILLEA, s. f. (Géog. anc.) Isle du Pont-Eu...</td>\n",
+       "      <td>ACHILLEA</td>\n",
+       "      <td>Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>19</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "           filename  volume  number  \\\n",
+       "913    volume01-350       1     350   \n",
+       "11935  volume01-561       1     561   \n",
+       "10142  volume01-581       1     581   \n",
+       "1400   volume01-582       1     582   \n",
+       "14193  volume01-591       1     591   \n",
+       "\n",
+       "                                                 content  headword  \\\n",
+       "913                * Abyde, (Géog. anc.) ville d'Egypte.     Abyde   \n",
+       "11935  * ACHAIE, s. m. (Geog. anc.) C'est le nom d'un...    ACHAIE   \n",
+       "10142  * ACHERON, s. m. (Géog. anc. & Myth.) C'étoit ...   ACHERON   \n",
+       "1400   * ACHERUSE, s. f. (Géog. Hist. anc. & Myth.) l...  ACHERUSE   \n",
+       "14193  * ACHILLEA, s. f. (Géog. anc.) Isle du Pont-Eu...  ACHILLEA   \n",
+       "\n",
+       "                                         normClass   author  nb Words  nb EN  \\\n",
+       "913                            Géographie ancienne  Diderot         6      2   \n",
+       "11935                          Géographie ancienne  Diderot        46      7   \n",
+       "10142             Géographie ancienne | Mythologie  Diderot        49      7   \n",
+       "1400   Géographie | Histoire ancienne | Mythologie  Diderot       112      7   \n",
+       "14193                          Géographie ancienne  Diderot        19      4   \n",
+       "\n",
+       "       nb Name EDDA  nb Person  nb ENE  nb ENE Place  nb ENE Person  \\\n",
+       "913               2          0       1             1              0   \n",
+       "11935             5          0       3             3              0   \n",
+       "10142             6          0       3             3              0   \n",
+       "1400              4          1       1             1              0   \n",
+       "14193             2          1       0             0              0   \n",
+       "\n",
+       "       nb EN geocoded  nb EN EDDA geocoded       type  latlong  latlong value  \n",
+       "913                 1                    1      ville    False            NaN  \n",
+       "11935               1                    1        NaN    False            NaN  \n",
+       "10142               3                    2        NaN    False            NaN  \n",
+       "1400                1                    1  hydronyme    False            NaN  \n",
+       "14193               0                    0        NaN    False            NaN  "
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "data_diderot.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "#trier par longueur\n",
+    "\n",
+    "sample_long_diderot = data_diderot.sort_values(by=['nb Words'], ascending=False)\n",
+    "\n",
+    "sample_long_jaucourt = data_jaucourt.sort_values(by=['nb Words'], ascending=False)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>9498</th>\n",
+       "      <td>volume02-2049</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2049</td>\n",
+       "      <td>* BILINLOKA, (Géog.) ville de Moldavie.</td>\n",
+       "      <td>BILINLOKA</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8363</th>\n",
+       "      <td>volume01-1638</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1638</td>\n",
+       "      <td>* ALEGRE, (Géog.) Voyez Allegre.</td>\n",
+       "      <td>ALEGRE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7947</th>\n",
+       "      <td>volume01-1637</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1637</td>\n",
+       "      <td>* ALEGRANIA, (Géog.) Voyez Allegrania.</td>\n",
+       "      <td>ALEGRANIA</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5877</th>\n",
+       "      <td>volume01-3653</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3653</td>\n",
+       "      <td>* ARCÉE, (Géog.) Voyez Petra.</td>\n",
+       "      <td>ARCÉE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10080</th>\n",
+       "      <td>volume01-3501</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3501</td>\n",
+       "      <td>* ARACLEA. (Géog.) Voyez Héraclée.</td>\n",
+       "      <td>ARACLEA</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "            filename  volume  number                                  content  \\\n",
+       "9498   volume02-2049       2    2049  * BILINLOKA, (Géog.) ville de Moldavie.   \n",
+       "8363   volume01-1638       1    1638         * ALEGRE, (Géog.) Voyez Allegre.   \n",
+       "7947   volume01-1637       1    1637   * ALEGRANIA, (Géog.) Voyez Allegrania.   \n",
+       "5877   volume01-3653       1    3653            * ARCÉE, (Géog.) Voyez Petra.   \n",
+       "10080  volume01-3501       1    3501       * ARACLEA. (Géog.) Voyez Héraclée.   \n",
+       "\n",
+       "        headword   normClass   author  nb Words  nb EN  nb Name EDDA  \\\n",
+       "9498   BILINLOKA  Géographie  Diderot         5      2             2   \n",
+       "8363      ALEGRE  Géographie  Diderot         4      2             1   \n",
+       "7947   ALEGRANIA  Géographie  Diderot         4      2             2   \n",
+       "5877       ARCÉE  Géographie  Diderot         4      2             2   \n",
+       "10080    ARACLEA  Géographie  Diderot         4      2             2   \n",
+       "\n",
+       "       nb Person  nb ENE  nb ENE Place  nb ENE Person  nb EN geocoded  \\\n",
+       "9498           0       1             1              0               1   \n",
+       "8363           0       0             0              0               2   \n",
+       "7947           0       0             0              0               0   \n",
+       "5877           0       0             0              0               1   \n",
+       "10080          0       0             0              0               0   \n",
+       "\n",
+       "       nb EN EDDA geocoded   type  latlong  latlong value  \n",
+       "9498                     1  ville    False            NaN  \n",
+       "8363                     1    NaN    False            NaN  \n",
+       "7947                     0    NaN    False            NaN  \n",
+       "5877                     1    NaN    False            NaN  \n",
+       "10080                    0    NaN    False            NaN  "
+      ]
+     },
+     "execution_count": 34,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample_long_diderot.tail()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "sample_long_diderot = sample_long_diderot.head(100)\n",
+    "sample_long_jaucourt = sample_long_jaucourt.head(100)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>11752</th>\n",
+       "      <td>volume02-1783</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1783</td>\n",
+       "      <td>* BESANÇON, (Géog.) ville de France, capitale ...</td>\n",
+       "      <td>BESANÇON</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>708</td>\n",
+       "      <td>15</td>\n",
+       "      <td>12</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2377</th>\n",
+       "      <td>volume02-609</td>\n",
+       "      <td>2</td>\n",
+       "      <td>609</td>\n",
+       "      <td>* BARBARIE, s. f. (Géog.) grande contrée d'Afr...</td>\n",
+       "      <td>BARBARIE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>570</td>\n",
+       "      <td>44</td>\n",
+       "      <td>36</td>\n",
+       "      <td>1</td>\n",
+       "      <td>7</td>\n",
+       "      <td>6</td>\n",
+       "      <td>0</td>\n",
+       "      <td>14</td>\n",
+       "      <td>11</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13331</th>\n",
+       "      <td>volume01-5149</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5149</td>\n",
+       "      <td>* AUVERGNE (Géographie.) province de France d'...</td>\n",
+       "      <td>AUVERGNE</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>459</td>\n",
+       "      <td>44</td>\n",
+       "      <td>29</td>\n",
+       "      <td>0</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>24</td>\n",
+       "      <td>16</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6697</th>\n",
+       "      <td>volume01-3489</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3489</td>\n",
+       "      <td>* ARABIE, (Géog. anc. &amp; mod.) pays considérabl...</td>\n",
+       "      <td>ARABIE</td>\n",
+       "      <td>Géographie moderne | Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>459</td>\n",
+       "      <td>54</td>\n",
+       "      <td>35</td>\n",
+       "      <td>2</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>18</td>\n",
+       "      <td>16</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13426</th>\n",
+       "      <td>volume02-1650</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1650</td>\n",
+       "      <td>* Benin, (Géog.) capitale du royaume de même n...</td>\n",
+       "      <td>Benin</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>458</td>\n",
+       "      <td>13</td>\n",
+       "      <td>10</td>\n",
+       "      <td>1</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>5</td>\n",
+       "      <td>5</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7824</th>\n",
+       "      <td>volume02-1517</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1517</td>\n",
+       "      <td>* BELCASTRO, (Géog. anc. &amp; mod.) ville d'Itali...</td>\n",
+       "      <td>BELCASTRO</td>\n",
+       "      <td>Géographie moderne | Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>62</td>\n",
+       "      <td>9</td>\n",
+       "      <td>6</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11257</th>\n",
+       "      <td>volume05-1431</td>\n",
+       "      <td>5</td>\n",
+       "      <td>1431</td>\n",
+       "      <td>* EDESSE, s. f. (Géog. anc. &amp; mod.) ville de l...</td>\n",
+       "      <td>EDESSE</td>\n",
+       "      <td>Géographie moderne | Géographie ancienne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>62</td>\n",
+       "      <td>11</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2</td>\n",
+       "      <td>3</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6631</th>\n",
+       "      <td>volume01-3754</td>\n",
+       "      <td>1</td>\n",
+       "      <td>3754</td>\n",
+       "      <td>* ARDÉE, (Géog. anc. &amp; Myth.) ville capitale d...</td>\n",
+       "      <td>ARDÉE</td>\n",
+       "      <td>Géographie ancienne | Mythologie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>61</td>\n",
+       "      <td>6</td>\n",
+       "      <td>4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5248</th>\n",
+       "      <td>volume02-1769</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1769</td>\n",
+       "      <td>* BERRI, (Géog.) province de France, avec titr...</td>\n",
+       "      <td>BERRI</td>\n",
+       "      <td>Géographie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>60</td>\n",
+       "      <td>13</td>\n",
+       "      <td>11</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>pays</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8692</th>\n",
+       "      <td>volume01-4577</td>\n",
+       "      <td>1</td>\n",
+       "      <td>4577</td>\n",
+       "      <td>* ATACAMA, (Géog. mod.) port de mer, dans l'Am...</td>\n",
+       "      <td>ATACAMA</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>60</td>\n",
+       "      <td>5</td>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>100 rows × 19 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "            filename  volume  number  \\\n",
+       "11752  volume02-1783       2    1783   \n",
+       "2377    volume02-609       2     609   \n",
+       "13331  volume01-5149       1    5149   \n",
+       "6697   volume01-3489       1    3489   \n",
+       "13426  volume02-1650       2    1650   \n",
+       "...              ...     ...     ...   \n",
+       "7824   volume02-1517       2    1517   \n",
+       "11257  volume05-1431       5    1431   \n",
+       "6631   volume01-3754       1    3754   \n",
+       "5248   volume02-1769       2    1769   \n",
+       "8692   volume01-4577       1    4577   \n",
+       "\n",
+       "                                                 content   headword  \\\n",
+       "11752  * BESANÇON, (Géog.) ville de France, capitale ...   BESANÇON   \n",
+       "2377   * BARBARIE, s. f. (Géog.) grande contrée d'Afr...   BARBARIE   \n",
+       "13331  * AUVERGNE (Géographie.) province de France d'...   AUVERGNE   \n",
+       "6697   * ARABIE, (Géog. anc. & mod.) pays considérabl...     ARABIE   \n",
+       "13426  * Benin, (Géog.) capitale du royaume de même n...      Benin   \n",
+       "...                                                  ...        ...   \n",
+       "7824   * BELCASTRO, (Géog. anc. & mod.) ville d'Itali...  BELCASTRO   \n",
+       "11257  * EDESSE, s. f. (Géog. anc. & mod.) ville de l...     EDESSE   \n",
+       "6631   * ARDÉE, (Géog. anc. & Myth.) ville capitale d...      ARDÉE   \n",
+       "5248   * BERRI, (Géog.) province de France, avec titr...      BERRI   \n",
+       "8692   * ATACAMA, (Géog. mod.) port de mer, dans l'Am...    ATACAMA   \n",
+       "\n",
+       "                                      normClass   author  nb Words  nb EN  \\\n",
+       "11752                                Géographie  Diderot       708     15   \n",
+       "2377                                 Géographie  Diderot       570     44   \n",
+       "13331                                Géographie  Diderot       459     44   \n",
+       "6697   Géographie moderne | Géographie ancienne  Diderot       459     54   \n",
+       "13426                                Géographie  Diderot       458     13   \n",
+       "...                                         ...      ...       ...    ...   \n",
+       "7824   Géographie moderne | Géographie ancienne  Diderot        62      9   \n",
+       "11257  Géographie moderne | Géographie ancienne  Diderot        62     11   \n",
+       "6631           Géographie ancienne | Mythologie  Diderot        61      6   \n",
+       "5248                                 Géographie  Diderot        60     13   \n",
+       "8692                         Géographie moderne  Diderot        60      5   \n",
+       "\n",
+       "       nb Name EDDA  nb Person  nb ENE  nb ENE Place  nb ENE Person  \\\n",
+       "11752            12          1       5             2              0   \n",
+       "2377             36          1       7             6              0   \n",
+       "13331            29          0       5             5              0   \n",
+       "6697             35          2      10             5              0   \n",
+       "13426            10          1       5             2              2   \n",
+       "...             ...        ...     ...           ...            ...   \n",
+       "7824              6          3       3             3              0   \n",
+       "11257             5          2       3             1              2   \n",
+       "6631              4          1       2             1              0   \n",
+       "5248             11          0       1             1              0   \n",
+       "8692              4          0       0             0              0   \n",
+       "\n",
+       "       nb EN geocoded  nb EN EDDA geocoded   type  latlong  latlong value  \n",
+       "11752               5                    4  ville     True            NaN  \n",
+       "2377               14                   11   pays    False            NaN  \n",
+       "13331              24                   16   pays    False            NaN  \n",
+       "6697               18                   16   pays     True            NaN  \n",
+       "13426               5                    5  ville     True            NaN  \n",
+       "...               ...                  ...    ...      ...            ...  \n",
+       "7824                3                    3  ville     True            NaN  \n",
+       "11257               2                    1  ville    False            NaN  \n",
+       "6631                2                    2  ville    False            NaN  \n",
+       "5248                3                    3   pays    False            NaN  \n",
+       "8692                2                    1    NaN     True            NaN  \n",
+       "\n",
+       "[100 rows x 19 columns]"
+      ]
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample_long_diderot.head(100)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "174.03752865934595\n",
+      "33.66\n",
+      "64.0\n",
+      "21.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(data_jaucourt['nb Words'].mean())\n",
+    "print(data_diderot['nb Words'].mean())\n",
+    "\n",
+    "print(data_jaucourt['nb Words'].median())\n",
+    "print(data_diderot['nb Words'].median())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_long_diderot.to_csv('../Data/ArticlesLongGeoDiderot-Sample100-21.10.11.tsv', sep='\\t', index=False)\n",
+    "sample_long_jaucourt.to_csv('../Data/ArticlesLongGeoJaucourt-Sample100-21.10.11.tsv', sep='\\t', index=False)\n",
+    "#sample10.to_csv('../Data/FranceGEOArticles-Sample10-21.08.17.tsv', sep='\\t', index=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1\n",
+      "2\n",
+      "3\n"
+     ]
+    }
+   ],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_d = data[(data['author'] == 'Diderot')]\n",
+    "data_j = data[(data['author'] == 'Jaucourt')]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 64,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(5509, 19)\n",
+      "(17266, 19)\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(data_d.shape)\n",
+    "print(data_j.shape)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "282.0755820688058\n",
+      "205.50172445089854\n",
+      "90.0\n",
+      "43.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(data_j['nb Words'].mean())\n",
+    "print(data_d['nb Words'].mean())\n",
+    "\n",
+    "print(data_j['nb Words'].median())\n",
+    "print(data_d['nb Words'].median())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_long_d = data_d.sort_values(by=['nb Words'], ascending=False)\n",
+    "sample_long_j = data_j.sort_values(by=['nb Words'], ascending=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_long_d = sample_long_d.head(100)\n",
+    "sample_long_j = sample_long_j.head(100)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>14484</th>\n",
+       "      <td>volume05-2355</td>\n",
+       "      <td>5</td>\n",
+       "      <td>2355</td>\n",
+       "      <td>* ENCYCLOPÉDIE, s. f. (Philosoph.) Ce mot sign...</td>\n",
+       "      <td>ENCYCLOPÉDIE</td>\n",
+       "      <td>Philosophie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>36933</td>\n",
+       "      <td>472</td>\n",
+       "      <td>333</td>\n",
+       "      <td>65</td>\n",
+       "      <td>109</td>\n",
+       "      <td>5</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>27187</th>\n",
+       "      <td>volume09-118</td>\n",
+       "      <td>9</td>\n",
+       "      <td>118</td>\n",
+       "      <td>* Juifs, Philosophie des, (Hist. de la Philoso...</td>\n",
+       "      <td>Juifs, Philosophie des</td>\n",
+       "      <td>Histoire de la philosophie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>34746</td>\n",
+       "      <td>1295</td>\n",
+       "      <td>623</td>\n",
+       "      <td>228</td>\n",
+       "      <td>349</td>\n",
+       "      <td>30</td>\n",
+       "      <td>18</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15784</th>\n",
+       "      <td>volume04-950</td>\n",
+       "      <td>4</td>\n",
+       "      <td>950</td>\n",
+       "      <td>* Corderie, (Ord. encyclop. Entend. Mémoire. H...</td>\n",
+       "      <td>Corderie</td>\n",
+       "      <td>Corderie</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>32333</td>\n",
+       "      <td>434</td>\n",
+       "      <td>334</td>\n",
+       "      <td>18</td>\n",
+       "      <td>120</td>\n",
+       "      <td>4</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14131</th>\n",
+       "      <td>volume05-1220</td>\n",
+       "      <td>5</td>\n",
+       "      <td>1220</td>\n",
+       "      <td>* ECLECTISME, s. m. (Hist. de la Philosophie a...</td>\n",
+       "      <td>ECLECTISME</td>\n",
+       "      <td>Histoire de la philosophie ancienne | Histoire...</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>30178</td>\n",
+       "      <td>735</td>\n",
+       "      <td>356</td>\n",
+       "      <td>70</td>\n",
+       "      <td>212</td>\n",
+       "      <td>16</td>\n",
+       "      <td>6</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>130</th>\n",
+       "      <td>volume03-709</td>\n",
+       "      <td>3</td>\n",
+       "      <td>709</td>\n",
+       "      <td>* CHAPEAU, s. m. (Art méchan.) ce terme 2 deux...</td>\n",
+       "      <td>CHAPEAU</td>\n",
+       "      <td>Art méchanique</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>19399</td>\n",
+       "      <td>252</td>\n",
+       "      <td>213</td>\n",
+       "      <td>7</td>\n",
+       "      <td>65</td>\n",
+       "      <td>6</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>24730</th>\n",
+       "      <td>volume02-5076</td>\n",
+       "      <td>2</td>\n",
+       "      <td>5076</td>\n",
+       "      <td>* CANAL ARTIFICIEL, (Hist. &amp; Architecture.) li...</td>\n",
+       "      <td>CANAL ARTIFICIEL</td>\n",
+       "      <td>Histoire | Architecture</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>1934</td>\n",
+       "      <td>153</td>\n",
+       "      <td>109</td>\n",
+       "      <td>13</td>\n",
+       "      <td>41</td>\n",
+       "      <td>10</td>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25356</th>\n",
+       "      <td>volume08-1114</td>\n",
+       "      <td>8</td>\n",
+       "      <td>1114</td>\n",
+       "      <td>* HIÉRARCHIE, s. f. (Hist. ecclésiast.) il se ...</td>\n",
+       "      <td>HIÉRARCHIE</td>\n",
+       "      <td>Histoire ecclésiastique</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>1912</td>\n",
+       "      <td>47</td>\n",
+       "      <td>11</td>\n",
+       "      <td>3</td>\n",
+       "      <td>16</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>22363</th>\n",
+       "      <td>volume02-4565</td>\n",
+       "      <td>2</td>\n",
+       "      <td>4565</td>\n",
+       "      <td>* CADENAT, s. m. est une espece de petite serr...</td>\n",
+       "      <td>CADENAT</td>\n",
+       "      <td>unclassified</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>1839</td>\n",
+       "      <td>58</td>\n",
+       "      <td>39</td>\n",
+       "      <td>1</td>\n",
+       "      <td>27</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5752</th>\n",
+       "      <td>volume03-774</td>\n",
+       "      <td>3</td>\n",
+       "      <td>774</td>\n",
+       "      <td>* CHAR, s. m. (Hist. anc. &amp; mod.) On donnoit a...</td>\n",
+       "      <td>CHAR</td>\n",
+       "      <td>Histoire ancienne | Histoire moderne</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>1810</td>\n",
+       "      <td>71</td>\n",
+       "      <td>31</td>\n",
+       "      <td>14</td>\n",
+       "      <td>17</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4041</th>\n",
+       "      <td>volume03-2171</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2171</td>\n",
+       "      <td>* CITOYEN, s. m. (Hist. anc. mod. Droit publ.)...</td>\n",
+       "      <td>CITOYEN</td>\n",
+       "      <td>Droit public | Histoire moderne | Histoire anc...</td>\n",
+       "      <td>Diderot</td>\n",
+       "      <td>1758</td>\n",
+       "      <td>48</td>\n",
+       "      <td>22</td>\n",
+       "      <td>9</td>\n",
+       "      <td>13</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>100 rows × 19 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "            filename  volume  number  \\\n",
+       "14484  volume05-2355       5    2355   \n",
+       "27187   volume09-118       9     118   \n",
+       "15784   volume04-950       4     950   \n",
+       "14131  volume05-1220       5    1220   \n",
+       "130     volume03-709       3     709   \n",
+       "...              ...     ...     ...   \n",
+       "24730  volume02-5076       2    5076   \n",
+       "25356  volume08-1114       8    1114   \n",
+       "22363  volume02-4565       2    4565   \n",
+       "5752    volume03-774       3     774   \n",
+       "4041   volume03-2171       3    2171   \n",
+       "\n",
+       "                                                 content  \\\n",
+       "14484  * ENCYCLOPÉDIE, s. f. (Philosoph.) Ce mot sign...   \n",
+       "27187  * Juifs, Philosophie des, (Hist. de la Philoso...   \n",
+       "15784  * Corderie, (Ord. encyclop. Entend. Mémoire. H...   \n",
+       "14131  * ECLECTISME, s. m. (Hist. de la Philosophie a...   \n",
+       "130    * CHAPEAU, s. m. (Art méchan.) ce terme 2 deux...   \n",
+       "...                                                  ...   \n",
+       "24730  * CANAL ARTIFICIEL, (Hist. & Architecture.) li...   \n",
+       "25356  * HIÉRARCHIE, s. f. (Hist. ecclésiast.) il se ...   \n",
+       "22363  * CADENAT, s. m. est une espece de petite serr...   \n",
+       "5752   * CHAR, s. m. (Hist. anc. & mod.) On donnoit a...   \n",
+       "4041   * CITOYEN, s. m. (Hist. anc. mod. Droit publ.)...   \n",
+       "\n",
+       "                     headword  \\\n",
+       "14484            ENCYCLOPÉDIE   \n",
+       "27187  Juifs, Philosophie des   \n",
+       "15784                Corderie   \n",
+       "14131              ECLECTISME   \n",
+       "130                   CHAPEAU   \n",
+       "...                       ...   \n",
+       "24730        CANAL ARTIFICIEL   \n",
+       "25356              HIÉRARCHIE   \n",
+       "22363                 CADENAT   \n",
+       "5752                     CHAR   \n",
+       "4041                  CITOYEN   \n",
+       "\n",
+       "                                               normClass   author  nb Words  \\\n",
+       "14484                                        Philosophie  Diderot     36933   \n",
+       "27187                         Histoire de la philosophie  Diderot     34746   \n",
+       "15784                                           Corderie  Diderot     32333   \n",
+       "14131  Histoire de la philosophie ancienne | Histoire...  Diderot     30178   \n",
+       "130                                       Art méchanique  Diderot     19399   \n",
+       "...                                                  ...      ...       ...   \n",
+       "24730                            Histoire | Architecture  Diderot      1934   \n",
+       "25356                            Histoire ecclésiastique  Diderot      1912   \n",
+       "22363                                       unclassified  Diderot      1839   \n",
+       "5752                Histoire ancienne | Histoire moderne  Diderot      1810   \n",
+       "4041   Droit public | Histoire moderne | Histoire anc...  Diderot      1758   \n",
+       "\n",
+       "       nb EN  nb Name EDDA  nb Person  nb ENE  nb ENE Place  nb ENE Person  \\\n",
+       "14484    472           333         65     109             5              3   \n",
+       "27187   1295           623        228     349            30             18   \n",
+       "15784    434           334         18     120             4              1   \n",
+       "14131    735           356         70     212            16              6   \n",
+       "130      252           213          7      65             6              2   \n",
+       "...      ...           ...        ...     ...           ...            ...   \n",
+       "24730    153           109         13      41            10              4   \n",
+       "25356     47            11          3      16             0              1   \n",
+       "22363     58            39          1      27             0              0   \n",
+       "5752      71            31         14      17             0              2   \n",
+       "4041      48            22          9      13             2              0   \n",
+       "\n",
+       "       nb EN geocoded  nb EN EDDA geocoded type  latlong  latlong value  \n",
+       "14484               0                    0  NaN    False            NaN  \n",
+       "27187               0                    0  NaN    False            NaN  \n",
+       "15784               0                    0  NaN    False            NaN  \n",
+       "14131               0                    0  NaN    False            NaN  \n",
+       "130                 0                    0  NaN    False            NaN  \n",
+       "...               ...                  ...  ...      ...            ...  \n",
+       "24730               0                    0  NaN    False            NaN  \n",
+       "25356               0                    0  NaN    False            NaN  \n",
+       "22363               0                    0  NaN    False            NaN  \n",
+       "5752                0                    0  NaN    False            NaN  \n",
+       "4041                0                    0  NaN    False            NaN  \n",
+       "\n",
+       "[100 rows x 19 columns]"
+      ]
+     },
+     "execution_count": 72,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample_long_d.head(100)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 73,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>filename</th>\n",
+       "      <th>volume</th>\n",
+       "      <th>number</th>\n",
+       "      <th>content</th>\n",
+       "      <th>headword</th>\n",
+       "      <th>normClass</th>\n",
+       "      <th>author</th>\n",
+       "      <th>nb Words</th>\n",
+       "      <th>nb EN</th>\n",
+       "      <th>nb Name EDDA</th>\n",
+       "      <th>nb Person</th>\n",
+       "      <th>nb ENE</th>\n",
+       "      <th>nb ENE Place</th>\n",
+       "      <th>nb ENE Person</th>\n",
+       "      <th>nb EN geocoded</th>\n",
+       "      <th>nb EN EDDA geocoded</th>\n",
+       "      <th>type</th>\n",
+       "      <th>latlong</th>\n",
+       "      <th>latlong value</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>48722</th>\n",
+       "      <td>volume11-4606</td>\n",
+       "      <td>11</td>\n",
+       "      <td>4606</td>\n",
+       "      <td>PARIS, (Géog. mod.) ville capitale du royaume ...</td>\n",
+       "      <td>PARIS</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>20530</td>\n",
+       "      <td>1195</td>\n",
+       "      <td>435</td>\n",
+       "      <td>284</td>\n",
+       "      <td>547</td>\n",
+       "      <td>151</td>\n",
+       "      <td>47</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>True</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>45431</th>\n",
+       "      <td>volume10-1265</td>\n",
+       "      <td>10</td>\n",
+       "      <td>1265</td>\n",
+       "      <td>MÉDECINE, s. f. (Art &amp; Science.) La Médecine e...</td>\n",
+       "      <td>MÉDECINE</td>\n",
+       "      <td>Arts | Science</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>19687</td>\n",
+       "      <td>610</td>\n",
+       "      <td>232</td>\n",
+       "      <td>175</td>\n",
+       "      <td>166</td>\n",
+       "      <td>14</td>\n",
+       "      <td>7</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>53811</th>\n",
+       "      <td>volume16-3477</td>\n",
+       "      <td>16</td>\n",
+       "      <td>3477</td>\n",
+       "      <td>TRIUMVIRAT, s. m. (Hist. rom.) c'est le nom la...</td>\n",
+       "      <td>TRIUMVIRAT</td>\n",
+       "      <td>Histoire romaine</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>17669</td>\n",
+       "      <td>865</td>\n",
+       "      <td>269</td>\n",
+       "      <td>259</td>\n",
+       "      <td>255</td>\n",
+       "      <td>25</td>\n",
+       "      <td>17</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>52290</th>\n",
+       "      <td>volume14-4631</td>\n",
+       "      <td>14</td>\n",
+       "      <td>4631</td>\n",
+       "      <td>Sculpteurs anciens, (Sculpt. antiq.) comme les...</td>\n",
+       "      <td>Sculpteurs anciens</td>\n",
+       "      <td>Sculpture antique</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>16692</td>\n",
+       "      <td>1121</td>\n",
+       "      <td>294</td>\n",
+       "      <td>282</td>\n",
+       "      <td>349</td>\n",
+       "      <td>51</td>\n",
+       "      <td>24</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>34775</th>\n",
+       "      <td>volume12-1392</td>\n",
+       "      <td>12</td>\n",
+       "      <td>1392</td>\n",
+       "      <td>Pere de l'Église, (Hist. ecclésiast.) on nomme...</td>\n",
+       "      <td>Pere de l'Église</td>\n",
+       "      <td>Histoire ecclésiastique</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>13862</td>\n",
+       "      <td>760</td>\n",
+       "      <td>283</td>\n",
+       "      <td>167</td>\n",
+       "      <td>231</td>\n",
+       "      <td>15</td>\n",
+       "      <td>20</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>70369</th>\n",
+       "      <td>volume17-2067</td>\n",
+       "      <td>17</td>\n",
+       "      <td>2067</td>\n",
+       "      <td>WANTAGE, (Géog. mod.) bourg à marché d'Anglete...</td>\n",
+       "      <td>WANTAGE</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>4143</td>\n",
+       "      <td>171</td>\n",
+       "      <td>66</td>\n",
+       "      <td>36</td>\n",
+       "      <td>51</td>\n",
+       "      <td>7</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>64126</th>\n",
+       "      <td>volume17-1690</td>\n",
+       "      <td>17</td>\n",
+       "      <td>1690</td>\n",
+       "      <td>VOORHOUT, (Géog. mod.) village de Hollande, su...</td>\n",
+       "      <td>VOORHOUT</td>\n",
+       "      <td>Géographie moderne</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>4117</td>\n",
+       "      <td>130</td>\n",
+       "      <td>71</td>\n",
+       "      <td>40</td>\n",
+       "      <td>42</td>\n",
+       "      <td>3</td>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>ville</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>54728</th>\n",
+       "      <td>volume16-3376</td>\n",
+       "      <td>16</td>\n",
+       "      <td>3376</td>\n",
+       "      <td>TRIOMPHE, (Hist. rom.) cérémonie &amp; honneur  ex...</td>\n",
+       "      <td>TRIOMPHE</td>\n",
+       "      <td>Histoire romaine</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>4083</td>\n",
+       "      <td>209</td>\n",
+       "      <td>80</td>\n",
+       "      <td>45</td>\n",
+       "      <td>68</td>\n",
+       "      <td>5</td>\n",
+       "      <td>6</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>32813</th>\n",
+       "      <td>volume10-880</td>\n",
+       "      <td>10</td>\n",
+       "      <td>880</td>\n",
+       "      <td>MASQUE de théatre, (Hist. du théatre des ancie...</td>\n",
+       "      <td>MASQUE de théatre</td>\n",
+       "      <td>Histoire du théatre des anciens</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>4079</td>\n",
+       "      <td>107</td>\n",
+       "      <td>24</td>\n",
+       "      <td>17</td>\n",
+       "      <td>25</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>58403</th>\n",
+       "      <td>volume15-75</td>\n",
+       "      <td>15</td>\n",
+       "      <td>75</td>\n",
+       "      <td>SENSITIVE, (Botan.) plante fort connue par la ...</td>\n",
+       "      <td>SENSITIVE</td>\n",
+       "      <td>Botanique</td>\n",
+       "      <td>Jaucourt</td>\n",
+       "      <td>4050</td>\n",
+       "      <td>66</td>\n",
+       "      <td>41</td>\n",
+       "      <td>5</td>\n",
+       "      <td>17</td>\n",
+       "      <td>2</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>100 rows × 19 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "            filename  volume  number  \\\n",
+       "48722  volume11-4606      11    4606   \n",
+       "45431  volume10-1265      10    1265   \n",
+       "53811  volume16-3477      16    3477   \n",
+       "52290  volume14-4631      14    4631   \n",
+       "34775  volume12-1392      12    1392   \n",
+       "...              ...     ...     ...   \n",
+       "70369  volume17-2067      17    2067   \n",
+       "64126  volume17-1690      17    1690   \n",
+       "54728  volume16-3376      16    3376   \n",
+       "32813   volume10-880      10     880   \n",
+       "58403    volume15-75      15      75   \n",
+       "\n",
+       "                                                 content            headword  \\\n",
+       "48722  PARIS, (Géog. mod.) ville capitale du royaume ...               PARIS   \n",
+       "45431  MÉDECINE, s. f. (Art & Science.) La Médecine e...            MÉDECINE   \n",
+       "53811  TRIUMVIRAT, s. m. (Hist. rom.) c'est le nom la...          TRIUMVIRAT   \n",
+       "52290  Sculpteurs anciens, (Sculpt. antiq.) comme les...  Sculpteurs anciens   \n",
+       "34775  Pere de l'Église, (Hist. ecclésiast.) on nomme...    Pere de l'Église   \n",
+       "...                                                  ...                 ...   \n",
+       "70369  WANTAGE, (Géog. mod.) bourg à marché d'Anglete...             WANTAGE   \n",
+       "64126  VOORHOUT, (Géog. mod.) village de Hollande, su...            VOORHOUT   \n",
+       "54728  TRIOMPHE, (Hist. rom.) cérémonie & honneur  ex...            TRIOMPHE   \n",
+       "32813  MASQUE de théatre, (Hist. du théatre des ancie...   MASQUE de théatre   \n",
+       "58403  SENSITIVE, (Botan.) plante fort connue par la ...           SENSITIVE   \n",
+       "\n",
+       "                             normClass    author  nb Words  nb EN  \\\n",
+       "48722               Géographie moderne  Jaucourt     20530   1195   \n",
+       "45431                   Arts | Science  Jaucourt     19687    610   \n",
+       "53811                 Histoire romaine  Jaucourt     17669    865   \n",
+       "52290                Sculpture antique  Jaucourt     16692   1121   \n",
+       "34775          Histoire ecclésiastique  Jaucourt     13862    760   \n",
+       "...                                ...       ...       ...    ...   \n",
+       "70369               Géographie moderne  Jaucourt      4143    171   \n",
+       "64126               Géographie moderne  Jaucourt      4117    130   \n",
+       "54728                 Histoire romaine  Jaucourt      4083    209   \n",
+       "32813  Histoire du théatre des anciens  Jaucourt      4079    107   \n",
+       "58403                        Botanique  Jaucourt      4050     66   \n",
+       "\n",
+       "       nb Name EDDA  nb Person  nb ENE  nb ENE Place  nb ENE Person  \\\n",
+       "48722           435        284     547           151             47   \n",
+       "45431           232        175     166            14              7   \n",
+       "53811           269        259     255            25             17   \n",
+       "52290           294        282     349            51             24   \n",
+       "34775           283        167     231            15             20   \n",
+       "...             ...        ...     ...           ...            ...   \n",
+       "70369            66         36      51             7              5   \n",
+       "64126            71         40      42             3              4   \n",
+       "54728            80         45      68             5              6   \n",
+       "32813            24         17      25             4              2   \n",
+       "58403            41          5      17             2              1   \n",
+       "\n",
+       "       nb EN geocoded  nb EN EDDA geocoded   type  latlong  latlong value  \n",
+       "48722               0                    0  ville     True            NaN  \n",
+       "45431               0                    0    NaN    False            NaN  \n",
+       "53811               0                    0    NaN    False            NaN  \n",
+       "52290               0                    0    NaN    False            NaN  \n",
+       "34775               0                    0    NaN    False            NaN  \n",
+       "...               ...                  ...    ...      ...            ...  \n",
+       "70369               0                    0  ville    False            NaN  \n",
+       "64126               0                    0  ville    False            NaN  \n",
+       "54728               0                    0    NaN    False            NaN  \n",
+       "32813               0                    0    NaN    False            NaN  \n",
+       "58403               0                    0    NaN    False            NaN  \n",
+       "\n",
+       "[100 rows x 19 columns]"
+      ]
+     },
+     "execution_count": 73,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample_long_j.head(100)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_long_d.to_csv('../Data/ArticlesLongDiderot-Sample100-21.10.11.tsv', sep='\\t', index=False)\n",
+    "sample_long_j.to_csv('../Data/ArticlesLongJaucourt-Sample100-21.10.11.tsv', sep='\\t', index=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}