Codestin Search App

Quick Start

Install Dependencies

bash install.sh

Evaluate LTPO

Following command will evaluate LTPO on AIME2024 benchmark using LLaMA-3.1-8B-Instruct. To evaluate different models against other benchmarks, please change the corresponding arguments.

bash scripts/run_ltpo.sh

The detailed responses generated by the LLM are stored in output/logistics.pt.

Evaluate Zero-Shot CoT Baseline

Following command will evaluate Zero-Shot CoT baseline against all five reasoning benchmarks.

bash scripts/batch_baselines_cot.sh

The output logs are located in logs directory, prefixed with Baseline-CoT.

The detailed responses generated by the LLM are stored in output/logistics.pt.

Evaluate Zero-Shot CoT-Unk Baseline

Following command will evaluate Zero-Shot CoT-Unk baseline against all five reasoning benchmarks.

bash scripts/batch_baselines_cot_unk.sh

The output logs are located in logs directory, prefixed with Baseline-CoT-Unk.

The detailed responses generated by the LLM are stored in output/logistics.pt.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
extract_judge_answer		extract_judge_answer
scripts		scripts
.gitignore		.gitignore
README.md		README.md
data.py		data.py
install.sh		install.sh
ltpo.py		ltpo.py
main.py		main.py
prompts.py		prompts.py
requirements.txt		requirements.txt
reward.py		reward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quick Start

Install Dependencies

Evaluate LTPO

Evaluate Zero-Shot CoT Baseline

Evaluate Zero-Shot CoT-Unk Baseline

About

Uh oh!

Releases

Packages

Languages

ltpo2025/LTPO

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Install Dependencies

Evaluate LTPO

Evaluate Zero-Shot CoT Baseline

Evaluate Zero-Shot CoT-Unk Baseline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages