SIRBench-V1: Scientific Inductive Reasoning Benchmark

This repository contains the code for our EMNLP 2025 paper: "On LLM-Based Scientific Inductive Reasoning Beyond Equations".

📋 Overview

Large Language Models (LLMs) have demonstrated strong deductive reasoning skills (e.g., mathematics, programming). However, their ability to perform inductive reasoning in scientific contexts remains underexplored.

We introduce SIRBench-V1, the first benchmark to systematically evaluate LLMs on scientific inductive reasoning tasks beyond mathematical equations.

The benchmark spans 7 tasks across biology and chemistry:

🧬 Biology
1. DNA Translation
2. DNA Table Inference
3. DNA Transformation
⚗️ Chemistry
1. Molecule Design
2. Molecule Captioning
3. Reaction Prediction
4. Name Prediction

Each task requires models to induce underlying scientific rules from examples and apply them to new inputs, rather than simply memorizing known mappings.

🚀 Installation

This repository builds on the OpenCompass framework, which enables efficient evaluation across different LLMs.

📥 Clone this repository
🛠️ Install the framework

pip install -e .

📦 Install additional dependencies

pip install fcd rdkit biopython tenacity

💻 Usage

Each task can be run with the corresponding config file:

opencompass examples/eval_sirbenchv1_{task}.py

For example, to evaluate DNA Translation:

opencompass examples/eval_sirbenchv1_dna_transform.py

You can modify the config files under ./examples to test any model supported by OpenCompass.

Note: Before running experiments, please configure your OpenAI API key. For quick experiments, API key can be added to the examples/eval_sirbenchv1_{task}.py files.

📁 Repository Structure

./examples/: Evaluation entrypoints for each SIRBench-V1 task.
./data/sirbenchv1/: Processed datasets.
opencompass/configs/datasets/sirbenchv1/: Benchmark dataset configuration files.
opencompass/datasets/sirbenchv1/: Data loaders for SIRBench-V1.
opencampasslongicl/opencompass/openicl/icl_inferencer/: Custom inference strategies
- icl_hr_inferencer.py (Hypothesis Refinement)
- icl_onepass_sc_inferencer.py (Self-Consistency)

📊 Data Source

We build SIRBench-V1 from authentic and counterfactual tasks using existing scientific resources.

Our benchmark is also available at Hugging Face.

To generate more test samples based on different dataset configurations, please download the datasets from the sources below and put them in the specified path. The configurations files in opencompass/configs/datasets/sirbenchv1/ can then be modified accordingly.

Genomic sequences: GenomicLLM_GRCh38
- 20230906_cds_res_nt2aa_dev.csv, 20230906_cds_res_nt2aa_test.csv, 20230906_cds_res_nt2aa_train.csv -> data/sirbenchv1/dna_translator/
Molecule design & captioning: ChEBI-20
- ChEBI-20_data -> data/sirbenchv1/chem_molecule_design/
Reaction prediction: USPTO-MIT Mixed
- uspto_mixed.pickle -> data/sirbenchv1/chem_reaction_prediction/
Name prediction: PubChem
- llm_test.csv -> data/sirbenchv1/chem_name_prediction/

Citation

If you use this code or benchmark, please cite our paper:

@inproceedings{lin2025sirbench,
  title     = {On LLM-Based Scientific Inductive Reasoning Beyond Equations},
  author    = {Brian S. Lin and Jiaxin Yuan and Zihan Zhou and Shouli Wang and Shuo Wang and 
               Cunliang Kong and Qi Shi and Yuxuan Li and Liner Yang and Zhiyuan Liu and Maosong Sun},
  booktitle = {Proceedings of EMNLP},
  year      = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 915 Commits
.github		.github
assets		assets
data/sirbenchv1		data/sirbenchv1
docs		docs
examples		examples
opencompass		opencompass
requirements		requirements
tests		tests
tools		tools
.codespellrc		.codespellrc
.gitignore		.gitignore
.owners.yml		.owners.yml
.pre-commit-config-zh-cn.yaml		.pre-commit-config-zh-cn.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_opencompass.md		README_opencompass.md
README_opencompass_zh-CN.md		README_opencompass_zh-CN.md
dataset-index.yml		dataset-index.yml
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIRBench-V1: Scientific Inductive Reasoning Benchmark

📋 Overview

🚀 Installation

💻 Usage

📁 Repository Structure

📊 Data Source

Citation

About

Uh oh!

Releases

Packages

Contributors 143

Uh oh!

Languages

License

thunlp/SIR-Bench

Folders and files

Latest commit

History

Repository files navigation

SIRBench-V1: Scientific Inductive Reasoning Benchmark

📋 Overview

🚀 Installation

💻 Usage

📁 Repository Structure

📊 Data Source

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 143

Uh oh!

Languages

Packages