LRM Token Economy

An analysis of token efficiency in Large Reasoning Models (LRMs), investigating whether open-weight models systematically require more tokens than closed-weight models for comparable reasoning tasks.

The complete analysis and findings are detailed in the research report:
Draft and published version (Nous Research): Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark

Updates

See /recent_figures folder for updates with the latest models. (Added: Deepseek V3.1, fixed GPT-OSS, GPT5, Hermes4-405b, Deepseek V3.2, Sonnet 4.5)

Dataset

The dataset is also available on Huggingface

Overview

This repository contains the pipeline used to generate the data and figures for our analysis of token efficiency patterns across different categories of large language models.

Repository Structure

LRMTokenEconomy/
├── data/
│   ├── detailed_evaluations_*.json    # Detailed model evaluation results
│   ├── evaluation_summary_*.json      # Summary statistics per model
│   └── output_queries_*.json          # Query results and token usage data
├── evalset/                           # Evaluation prompts and test cases
├── figures/                           # Generated figures and charts
├── report/
│   ├── images/                        # Report figures and charts
│   └── report.md                      # Final analysis report
├── analyze_*.py                       # Analysis and visualization scripts
├── query-script*.py                   # Model querying scripts
├── evaluation-script.py               # Evaluation processing
├── aggregate_results.py               # Results aggregation
├── evaluation_stats.csv               # aggregated evaluation statistics
└── model_prices.csv                   # Model pricing data (auto-generated)

Installation

Clone the repository and install dependencies:

git clone https://github.com/cpldcpu/LRMTokenEconomy.git
cd LRMTokenEconomy
pip install -r requirements.txt

For running new evaluations, you'll need API credentials for OpenRouter and Google AI (for Gemini models).

Usage

Generating Analysis Figures

The repository includes several analysis scripts to reproduce the figures used in the research:

# Token efficiency analysis
python analyze_prompts.py --preset math
python analyze_prompts.py --preset logic_puzzle
python analyze_prompts.py --preset knowledge

# Cost and other analyses
python analyze_cost.py 
python analyze_cot_transcription.py 
python analyze_model_trends.py 
python analyze_wordstats.py

Use the --help command line flag with each script to see available options and configurations.

Running New Evaluations

# Query models and evaluate results
python query-script.py --config query_config.json 
python evaluation-script.py  
python aggregate_results.py

Configuration files include query_config.json (main configuration), query_config_full.json (full model evaluation), and query_config_recent.json (recent models only).

Citation

If you use this research or data in your work, please cite:

@misc{lrm_token_economy_2025,
  title={Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark},
  author={TSB},
  year={2025},
  month={August},
  url={https://github.com/cpldcpu/LRMTokenEconomy}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LRM Token Economy

Updates

Dataset

Overview

Repository Structure

Installation

Usage

Generating Analysis Figures

Running New Evaluations

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
evalset		evalset
recent_figures		recent_figures
report		report
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aggregate_results.py		aggregate_results.py
analyze_cost.py		analyze_cost.py
analyze_cot_transcription.py		analyze_cot_transcription.py
analyze_latency.py		analyze_latency.py
analyze_model_trends.py		analyze_model_trends.py
analyze_prompts.py		analyze_prompts.py
analyze_wordstats.py		analyze_wordstats.py
evaluation-script.py		evaluation-script.py
evaluation_stats.csv		evaluation_stats.csv
generate_all_figures.sh		generate_all_figures.sh
model_prices.csv		model_prices.csv
openrouter_price_scan.py		openrouter_price_scan.py
query-script-streaming.py		query-script-streaming.py
query-script.py		query-script.py
query_config.json		query_config.json
query_config_full.json		query_config_full.json
query_config_recent.json		query_config_recent.json
query_config_trends.json		query_config_trends.json
requirements.txt		requirements.txt

License

cpldcpu/LRMTokenEconomy

Folders and files

Latest commit

History

Repository files navigation

LRM Token Economy

Updates

Dataset

Overview

Repository Structure

Installation

Usage

Generating Analysis Figures

Running New Evaluations

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages