# Named entity recognition on Workflow data ## Description This directory contains all the necessary information and scripts to reproduce the results presented in : ``` @misc{sebe2024extractinginformationlowresourcesetting, title={Extracting Information in a Low-resource Setting: Case Study on Bioinformatics Workflows}, author={Clémence Sebe and Sarah Cohen-Boulakia and Olivier Ferret and Aurélie Névéol}, year={2024}, eprint={2411.19295}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.19295}, } ``` This paper is accepted to IDA 2025. ## Before You Start Before running the experiments, you need to: * Download the dataset : https://doi.org/10.5281/zenodo.14879025 * Clone this Git repository for the experiments with Nlstruct : https://github.com/ClemenceS/nlstruct * Clone this Git repository for auto-regressive experiments : https://github.com/ClemenceS/autoregressive_ner ## Contents This repository includes: * A python script, `run_nlstruct.py`, to launch NER experiences whose header information must be modified (data link and model to be trained) * A jupyter notebook, `add_voc_bioinfo.ipynb`, to integrate bioinformatics tools and binaries into models. ## Licence This project is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) ## Funding This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.