git clone https://github.com/yourusername/micota.git
cd micota
pip install -r requirements.txtTo generate training data from raw sources:
bash scripts/run_data_processing.shOutput: Processed data will be saved to data/processed/filtered_result.json with the following structure:
{
"instruction": "...",
"output": "...",
"answer": "...",
"resp_answer": "..."
}We adopted DARE as our model merging method.
cd mergekit
conda create -n mergekit
conda activate mergekit
pip install -e . We use mergekit framework for model merging. Run the merge and write your merged model to saves/model.
mergekit-yaml configs/dares_ties.yml saves/modelWe use LLaMA-Factory framework for model training.
cd LLaMA-Factory
conda create -n llama_factory python=3.10
conda activate llama_factory
pip install -e ".[torch,metrics]"
pip install deepspeedConfigure training parameters in YAML files:
llamafactory-cli train configs/3B.yamlWe employ lm-evaluation-harness, which is a tool for evaluating the performance of the fine-tuned models.
cd lm-evaluation-harness
conda create -n lm-evaluation-harness
conda activate lm-evaluation-harness
pip install -e .We provide comprehensive evaluation across multiple mathematical reasoning benchmarks:
lm_eval --model vllm \
--model_args "pretrained=Model_Path,tensor_parallel_size=4,gpu_memory_utilization=0.85,max_model_len=16000,enforce_eager=True" \
--tasks gsm8k_zero,AMC,AIME,Olympiad,hendrycks_math_500 \
--batch_size auto \
--gen_kwargs do_sample=false,temperature=0,max_gen_toks=16000 \
--output_path results/micota \
--apply_chat_template \
--log_samples For the AMC, AIME, Olympiad, and hendrycks_math_500 tasks, we leverage the customized evaluation tasks and scripts from the Small-Model-Learnability-Gap repository. Specifically, we adopted the task configurations and evaluation frameworks provided in the lm-evaluation-harness directory to assess model performance on complex reasoning benchmarks.
This implementation is based on the original work released under the MIT License, and we thank the authors for their open-source contributions.
| Task | Dataset | Description |
|---|---|---|
| GSM8K | Grade School Math | Basic arithmetic and reasoning |
| AMC | American Math Competition | Advanced problem solving |
| AIME | Math Olympiad | Competition-level problems |
| OlympiadBench | Math Olympiad | Olympiad-level problems |
| Hendrycks Math | Comprehensive Test | Diverse mathematical concepts |
This repository is built upon LLaMA-Factory, lm-evaluation-harness , mergekit , Small-Model-Learnability-Gap, and hiyouga/math. We would like to thank all contributors for their support.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use MiCoTA in your research, please cite our work:
@misc{ding2025micotabridginglearnabilitygap,
title={MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants},
author={Dongyi Ding and Tiannan Wang and Chenghao Zhu and Meiling Tao and Yuchen Eleanor Jiang and Wangchunshu Zhou},
year={2025},
eprint={2507.01887},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.01887},
}