-
ClemenceS authored2e75e94b
README.md 1.49 KiB
Named entity recognition on Workflow data
Description
This directory contains all the necessary information and scripts to reproduce the results presented in :
@misc{sebe2024extractinginformationlowresourcesetting,
title={Extracting Information in a Low-resource Setting: Case Study on Bioinformatics Workflows},
author={Clémence Sebe and Sarah Cohen-Boulakia and Olivier Ferret and Aurélie Névéol},
year={2024},
eprint={2411.19295},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.19295},
}
This paper is accepted to IDA 2025.
Before You Start
Before running the experiments, you need to:
- Download the dataset : https://doi.org/10.5281/zenodo.14879025
- Clone this Git repository for the experiments with Nlstruct : https://github.com/ClemenceS/nlstruct
- Clone this Git repository for auto-regressive experiments : https://github.com/ClemenceS/autoregressive_ner
Contents
This repository includes:
- A python script,
run_nlstruct.py
, to launch NER experiences whose header information must be modified (data link and model to be trained) - A jupyter notebook,
add_voc_bioinfo.ipynb
, to integrate bioinformatics tools and binaries into models.
Licence
This project is licensed under the Apache License, Version 2.0
Funding
This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.