Thanks to visit codestin.com
Credit goes to zenodo.org

Published October 9, 2025 | Version v1.0
Dataset Open

MicrobELP - annotated training sets and unannotated test set documents

  • 1. ROR icon Imperial College London
  • 1. ROR icon Imperial College London
  • 2. ROR icon University of Lusaka
  • 3. ROR icon University of Nottingham

Description

This submission contains 3 sets of data:

  • machine_train_with_ref.zip
  • machine_train_without_ref.zip
  • test_without_annotations.zip

The training set is given with and without references, and was machine annotated using the pipeline code given in the GitHub.

The test set documents are unannotated and can be used for performance benchmarking on Codabench.

All files are given in BioC-JSON format obtained using Auto-CORPus.

Files

machine_train_with_ref.zip

Files (44.8 MB)

Name Size Download all
md5:27e2afdcca771069b3811b6da639fbc5
24.4 MB Preview Download
md5:683b73b130ef2b23e1b5056d6acd79a2
17.3 MB Preview Download
md5:01210017381f9fbb5d3fab6b166c3bfe
3.1 MB Preview Download

Additional details

Related works

Is metadata for
Publication: 10.1101/2025.08.29.671515v1 (DOI)

Software

Repository URL
https://github.com/omicsNLP/microbELP
Development Status
Active