Skip to content
Snippets Groups Projects
Commit 8da28111 authored by Alice Brenon's avatar Alice Brenon
Browse files

Adapt notebook computing confusion matrices

parent bf8bf178
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:11511929 tags:
# Confusion Matrices
We start by including the EDdA modules from the [project's gitlab](https://gitlab.liris.cnrs.fr/geode/EDdA-Classification).
%% Cell type:code id:a5f3d434 tags:
``` /gnu/store/2rpsj69fzmcnafz4rml0blrynfayxqzr-python-wrapper-3.9.9/bin/python
from EDdA import data
from EDdA.store import preparePath
from EDdA.classification import confusionMatrix, metrics, toPNG, topNGrams
import os
```
%% Cell type:markdown id:4c3064ea tags:
Then we load the training set into a new data structure called a `Source`, which contains a `pandas` `Dataframe` and a hash computed from the list of exact articles "coordinates" (volume and article number, and their order matters) contained in the original tsv file.
%% Cell type:code id:5ad65685 tags:
``` /gnu/store/2rpsj69fzmcnafz4rml0blrynfayxqzr-python-wrapper-3.9.9/bin/python
source = data.load('training_set')
```
%% Cell type:markdown id:4e958e04 tags:
This function rationalises the name of the files containing the confusion matrices to produce.
%% Cell type:code id:545bdb4f tags:
``` /gnu/store/2rpsj69fzmcnafz4rml0blrynfayxqzr-python-wrapper-3.9.9/bin/python
def preparePath(root, source, n, ranks, metricName):
path = "{root}/confusionMatrix/{inputHash}/{n}grams_top{ranks}_{name}.png".format(
root=root,
inputHash=source.hash,
n=n,
ranks=ranks,
name=metricName
)
os.makedirs(os.path.dirname(path), exist_ok=True)
return path
```
%% Cell type:markdown id:4079559f tags:
Then we only have to loop on the n-gram size (`n`), the number of `ranks` to keep when computing the most frequent ones and the comparison method (the metrics' `name`).
We loop on the n-gram size (`n`), the number of `ranks` to keep when computing the most frequent ones and the comparison method (the metrics' `name`).
%% Cell type:code id:b39c5be0 tags:
``` /gnu/store/2rpsj69fzmcnafz4rml0blrynfayxqzr-python-wrapper-3.9.9/bin/python
for n in range(1,4):
for ranks in [10, 50, 100]:
vectorizer = topNGrams(source, n, ranks)
for name in ['colinearity', 'keysIntersection']:
imagePath = preparePath('.', source, n, ranks, name)
imagePath = preparePath(f"confusionMatrix/{source.hash}/{n}grams_top{ranks}_{name}.png")
toPNG(confusionMatrix(vectorizer, metrics[name]), imagePath)
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment