This repository contains the source code for the paper: Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection.
• 🎯 Overview • ⚙️ Set Up • 🔧 Reproduction Guide
•
ORION is a reasoning distillation framework that refines teacher Chains-of-Thought (CoTs) through an Error-Aware Self-Reflection process. It addresses the key limitation of existing long-form CoT distillation methods—namely, the mismatch between teacher reasoning traces and the student model’s learning capacity. ORION enables the student model to actively refine teacher CoTs by incorporating its own solution errors, generating supervision signals that are more coherent, logically consistent, and tailored to its reasoning ability. Experiments on multiple mathematical reasoning benchmarks show that ORION consistently improves performance across different model architectures, demonstrating its robustness and generality.
Use git clone to download this project.
conda create -n ORION python=3.10
conda activate ORION
git clone https://github.com/NEUIR/ORION.git
cd ORION
pip install -r requirements.txt --force-reinstall --no-deps --no-cache-dirRefer to https://github.com/hiyouga/LLaMA-Factory for detailed instructions.
conda create -n llama_factory python=3.10
conda activate llama_factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"bash scripts/Response_sampling.sh bash scripts/Self-Reflection.sh bash scripts/sft.sh python src/eval_final.pyIf you have questions, suggestions, and bug reports, please email: