bash install.shFollowing command will evaluate LTPO on AIME2024 benchmark using LLaMA-3.1-8B-Instruct. To evaluate different models against other benchmarks, please change the corresponding arguments.
bash scripts/run_ltpo.shThe detailed responses generated by the LLM are stored in output/logistics.pt.
Following command will evaluate Zero-Shot CoT baseline against all five reasoning benchmarks.
bash scripts/batch_baselines_cot.shThe output logs are located in logs directory, prefixed with Baseline-CoT.
The detailed responses generated by the LLM are stored in output/logistics.pt.
Following command will evaluate Zero-Shot CoT-Unk baseline against all five reasoning benchmarks.
bash scripts/batch_baselines_cot_unk.shThe output logs are located in logs directory, prefixed with Baseline-CoT-Unk.
The detailed responses generated by the LLM are stored in output/logistics.pt.