Codestin Search App

From charts to code: a hierarchical benchmark for multimodal models

Welcome to Chart2Code! If you find this repo useful, please give a star ⭐ for encouragement.

🌟 Overview

Chart2Code-Benchmark is a new benchmark designed to evaluate chart generation capabilities of LMMs under progressively challenging conditions.

Chart2Code covers three progressively challenging levels: reproduction, editing, and long-table to chart generation.
Level1(Chart Reproduction) reproduces charts from a reference figure and user query;
Level2(Chart Editing) involves complex modifications such as changing chart types or adding elements;
Level3(Long-Table to Chart Generation) requires models to transform long, information-dense tables into faithful charts following user instructions.

More details about Chart2Code in project page.🌐

🚀 Quick Start

Here we provide a quick start guide to evaluate LMMs on Chart2Code.

Setup Environment

git clone https://github.com/showlab/Chart2Code.git
conda env create -f environment.yaml
conda activate chart2code
cd Chart2Code

Setup API key and API base URL in .env for different LMMs. Claude、 Gemini and GPT are accessed through API proxy providers，while Seed is accessed through ARK API.

OPENAI_API_KEY=${your_api_proxy_provider_key}
ARK_API_KEY=${your_ark_api_key}
OPENAI_API_URL=${your_api_proxy_provider_url}
ARK_BASE_URL=${your_ark_api_base_url}

Download Data

Download the Chart2Code data from Huggingface and unzip it under the root directory.

wget https://huggingface.co/datasets/CSU-JPG/Chart2Code/resolve/main/data.zip
unzip data.zip

The file structure should be like this:

├── data
│   ├── level1_direct
│   │   ├── 3d_1.png
│   │   ├── 3d_1.py
│   │   └── ...
│   ├── level1_figure
│   │   ├── fig1_density_2
│   │   ├── ...
│   └── level1_customize
│       ├── table_1_instruction_2.png
│       ├── table_1_instruction_2.py
│       ├── table_1_instruction_2_request.txt
│       └── table_1_instruction_2_data.txt
│       └── ...
│   ├── level2
│   │   ├── bar_1_v1.png
│   │   ├── bar_1_v1.py
│   │   ├── bar_1_v1_data.txt
│   │   └── ...
│   └── level3
│       ├── table_1.xlsx
│       ├── table1_1.png
│       ├── table1_1_generate.py
│       ├── table1_1.txt
│       ├── table1_1_generate.png
│       └── ...
│   ├── level1_direct.json
│   ├── level1_figure.json
│   ├── level1_customize.json
│   ├── level2.json
│   └── level3.json
│—— Evaluation
└── ...

Inference Setup

Inference for each benchmark level is handled by a dedicated shell script located in the scripts/ directory. You must specify a model for each run. You can do this in two ways:

Pass it as an argument (Recommended): Provide the MODEL_IDENTIFIER directly when executing the script.
Edit the script: Set the MODEL_IDENTIFIER variable inside the corresponding .sh file.

You can modify the LOAD_SOURCE parameter in the shell script to select how the model is loaded:

local: By default, the model will be loaded from the Inference/models directory.
hub: The model weights will be loaded directly from the Hugging Face Hub online.

You can also adjust other parameters like GPU_VISIBLE_DEVICES in the script to fit your hardware setup.

cd scripts/inference
# For level1_customize
bash inference_customize.sh qwen3_customize_30B
# For level1_direct
bash inference_direct.sh qwen2.5_direct_72B
# For level1_figure
bash inference_figure.sh InternVL_3.5_figure_38B
# For level2
bash inference_level2.sh deepseek_level2
# For level3
bash inference_level3.sh gpt_5_level3

Available Models

We now support the following models:

Model Name	MODEL_IDENTIFIER
Model Name	level1_customize	level1_direct	level1_figure	level2	level3
InternVL-3.5-38B	InternVL_3.5_customize_38B	InternVL_3.5_direct_38B	InternVL_3.5_figure_38B	InternVL_3.5_level2_38B	InternVL_3.5_level3_38B
InternVL-3.5-8B	InternVL_3.5_customize_8B	InternVL_3.5_direct_8B	InternVL_3.5_figure_8B	InternVL_3.5_level2_8B	InternVL_3.5_level3_8B
InternVL-3-38B	InternVL_3_customize_38B	InternVL_3_direct_38B	InternVL_3_figure_38B	InternVL_3_level2_38B	InternVL_3_level3_38B
InternVL-3-8B	InternVL_3_customize_8B	InternVL_3_direct_8B	InternVL_3_figure_8B	InternVL_3_level2_8B	InternVL_3_level3_8B
InternVL-2.5-38B	InternVL_2.5_customize_38B	InternVL_2.5_direct_38B	InternVL_2.5_figure_38B	InternVL_2.5_level2_38B	InternVL_2.5_level3_38B
InternVL-2.5-8B	InternVL_2.5_customize_8B	InternVL_2.5_direct_8B	InternVL_2.5_figure_8B	InternVL_2.5_level2_8B	InternVL_2.5_level3_8B
Qwen3-VL-30B	qwen3_customize_30B	qwen3_direct_30B	qwen3_figure_30B	qwen3_level2_30B	qwen3_level3_30B
Qwen3-VL-30B-think	qwen3_customize_30B_think	qwen3_direct_30B_think	qwen3_figure_30B_think	qwen3_level2_30B_think	qwen3_level3_30B_think
Qwen2.5-VL-72B	qwen2.5_customize_72B	qwen2.5_direct_72B	qwen2.5_figure_72B	qwen2.5_level2_72B	qwen2.5_level3_72B
Qwen2.5-VL-7B	qwen2.5_customize_7B	qwen2.5_direct_7B	qwen2.5_figure_7B	qwen2.5_level2_7B	qwen2.5_level3_7B
Qwen2-VL-72B	qwen2_customize_72B	qwen2_direct_72B	qwen2_figure_72B	qwen2_level2_72B	qwen2_level3_72B
Qwen2-VL-7B	qwen2_customize_7B	qwen2_direct_7B	qwen2_figure_7B	qwen2_level2_7B	qwen2_level3_7B
MOLMO-7B-D	molmo_customize_7BD	molmo_direct_7BD	molmo_figure_7BD	molmo_level2_7BD	molmo_level3_7BD
MIMO-VL-7B-RL-think	mimo_RL_customize_think	mimo_RL_direct_think	mimo_RL_figure_think	mimo_RL_level2_think	mimo_RL_level3_think
MIMO-VL-7B-RL-nothink	mimo_RL_customize_nothink	mimo_RL_direct_nothink	mimo_RL_figure_nothink	mimo_RL_level2_nothink	mimo_RL_level3_nothink
MIMO-VL-7B-SFT-nothink	mimo_SFT_customize_nothink	mimo_SFT_direct_nothink	mimo_SFT_figure_nothink	mimo_SFT_level2_nothink	mimo_SFT_level3_nothink
MIMO-VL-7B-SFT-think	mimo_SFT_customize_think	mimo_SFT_direct_think	mimo_SFT_figure_think	mimo_SFT_level2_think	mimo_SFT_level3_think
LLaVA-OV-Qwen2-7B-OV	llava_ov_customize	llava_ov_direct	llava_ov_figure	liava_ov_level2	llava_ov_level3
LLaVA-OV-Qwen2-7B-SI	llava_si_customize	llava_si_direct	llava_si_figure	llava_si_level2	llava_si_level3
SEED-1.6-VL	seed_1.6_customize	seed_1.6_direct	seed_1.6_figure	seed_1.6_level2	seed_1.6_level3
SEED-1.5-VL	seed_1.5_customize	seed_1.5_direct	seed_1.5_figure	seed_1.5_level2	seed_1.5_level3
Claude-Sonnet-4	claude_customize	claude_direct	claude_figure	claude_level2	claude_level3
DeepSeek-VL-7B	deepseek_customize	deepseek_direct	deepseek_figure	deepseek_level2	deepseek_level3
Gemini-2.5-Pro	gemini_2.5_customize	gemini_2.5_direct	gemini_2.5_figure	gemini_2.5_level2	gemini_2.5_level3
GLM-4V-9B	glm_customize	glm_direct	glm_figure	glm_level2	glm_level3
GPT-5	gpt_5_customize	gpt_5_direct	gpt_5_figure	gpt_5_level2	gpt_5_level3
Kimi-VL-A3B	kimi_customize	kimi_direct	kimi_figure	kimi_level2	kimi_level3

Evaluate Setup

For the results obtained from inference, the first step is to check the execution rate. The code that runs successfully and its corresponding generated images will undergo the following evaluations: base_evaluation, LLM_evaluation, and LMM_evaluation.

cd scripts/evaluate
# step1: check execution rate
bash execute_evaluate.sh
# step2: run base evaluation
bash base_evaluator.sh
# step3: run LLM evaluation to evaluate the code
bash LLM_evaluator.sh
# step4: run LMM evaluation to evaluate the image
bash LMM_evaluator.sh

📢 Update

[2025.10.22] We release our paper in arxiv.

❤ Acknowledgement

Special thanks to Henry Hengyuan Zhao for serving as the Project Leader of this paper.
We are grateful to Lijian Wu and Ziyuan Zhen for their hard work in data annotation and baseline testing.
We also extend our appreciation to Mao Dongxing, Yifei Tao, Lijian Wu, and Wan Yang for their contributions to this work.

🎓 BibTeX

If you find ChartCode useful, please cite using this BibTeX:

@misc{tang2025chartscodehierarchicalbenchmark,
      title={From Charts to Code: A Hierarchical Benchmark for Multimodal Models}, 
      author={Jiahao Tang and Henry Hengyuan Zhao and Lijian Wu and Yifei Tao and Dongxing Mao and Yang Wan and Jingru Tan and Min Zeng and Min Li and Alex Jinpeng Wang},
      year={2025},
      eprint={2510.17932},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2510.17932}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
css		css
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

From charts to code: a hierarchical benchmark for multimodal models

Welcome to Chart2Code! If you find this repo useful, please give a star ⭐ for encouragement.

🌟 Overview

🚀 Quick Start

Setup Environment

Download Data

Inference Setup

Evaluate Setup

📢 Update

❤ Acknowledgement

🎓 BibTeX

About

Uh oh!

Releases

Packages

Languages

CSU-JPG/Chart2Code.github.io

Folders and files

Latest commit

History

Repository files navigation

From charts to code: a hierarchical benchmark for multimodal models

Welcome to Chart2Code! If you find this repo useful, please give a star ⭐ for encouragement.

🌟 Overview

🚀 Quick Start

Setup Environment

Download Data

Inference Setup

Evaluate Setup

📢 Update

❤ Acknowledgement

🎓 BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages