Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection

This repository contains the source code for the paper: Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection.

• 🎯 Overview • ⚙️ Set Up • 🔧 Reproduction Guide

• ✈️ Experimental Result • 📃 Acknowledgement • 📝 Citation • 📨 Contact

🎯Overview

ORION is a reasoning distillation framework that refines teacher Chains-of-Thought (CoTs) through an Error-Aware Self-Reflection process. It addresses the key limitation of existing long-form CoT distillation methods—namely, the mismatch between teacher reasoning traces and the student model’s learning capacity. ORION enables the student model to actively refine teacher CoTs by incorporating its own solution errors, generating supervision signals that are more coherent, logically consistent, and tailored to its reasoning ability. Experiments on multiple mathematical reasoning benchmarks show that ORION consistently improves performance across different model architectures, demonstrating its robustness and generality.

⚙️Set Up

1. Python Environment.

Use git clone to download this project.

conda create -n ORION python=3.10
conda activate ORION
git clone https://github.com/NEUIR/ORION.git
cd ORION
pip install -r requirements.txt --force-reinstall --no-deps --no-cache-dir

2. Install LLaMA-Factory.

Refer to https://github.com/hiyouga/LLaMA-Factory for detailed instructions.

conda create -n llama_factory python=3.10
conda activate llama_factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

🔧ORION Pipeline

1、Response-Sampling

bash scripts/Response_sampling.sh

2、Self-Reflection

bash scripts/Self-Reflection.sh

3、Training the model

bash scripts/sft.sh

4、Evaluation

python src/eval_final.py

📨Contanct

If you have questions, suggestions, and bug reports, please email:

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
data		data
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection

🎯Overview

⚙️Set Up

1. Python Environment.

2. Install LLaMA-Factory.

🔧ORION Pipeline

1、Response-Sampling

2、Self-Reflection

3、Training the model

4、Evaluation

📨Contanct

About

Uh oh!

Releases

Packages

Languages

License

NEUIR/ORION

Folders and files

Latest commit

History

Repository files navigation

Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection

🎯Overview

⚙️Set Up

1. Python Environment.

2. Install LLaMA-Factory.

🔧ORION Pipeline

1、Response-Sampling

2、Self-Reflection

3、Training the model

4、Evaluation

📨Contanct

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages