Skip to content
Snippets Groups Projects

Named entity recognition on Workflow data

Description

This directory contains all the necessary information and scripts to reproduce the results presented in :

@misc{sebe2024extractinginformationlowresourcesetting,
      title={Extracting Information in a Low-resource Setting: Case Study on Bioinformatics Workflows}, 
      author={Clémence Sebe and Sarah Cohen-Boulakia and Olivier Ferret and Aurélie Névéol},
      year={2024},
      eprint={2411.19295},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.19295}, 
}

This paper is accepted to IDA 2025.

Before You Start

Before running the experiments, you need to:

Contents

This repository includes:

  • A python script, run_nlstruct.py, to launch NER experiences whose header information must be modified (data link and model to be trained)
  • A jupyter notebook, add_voc_bioinfo.ipynb, to integrate bioinformatics tools and binaries into models.

Licence

This project is licensed under the Apache License, Version 2.0

Funding

This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.