Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View sgp-bench's full-sized avatar

Block or report sgp-bench

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
sgp-bench/README.md

SGP-Bench

Here is the official evaluation code of SGP-Bench.

๐ŸŽ‰ We are happy to announce that our paper Can Large Language Models Understand Symbolic Graphics Programs? has been selected as spotlight at ICLR 2025. ๐ŸŽ‰

Teaser Image

Project Link: https://sgp-bench.github.io

Paper Link: https://arxiv.org/abs/2408.08313

Table of Contents

Data

The data used for evaluation or instruction tuning will be automatically downloaded by running load_dataset from huggingface datasets.

# SGP-Bench
load_dataset('sgp-bench/sgp-bench', split=split)
load_dataset('sgp-bench/sgp-mnist', split='mnist')
# SIT data
load_dataset('sgp-bench/sit_10k')
load_dataset('sgp-bench/sit_25k')
load_dataset('sgp-bench/sit_40k')
load_dataset('sgp-bench/sit_55k')
load_dataset('sgp-bench/sit_72k')
load_dataset('sgp-bench/rev_sit_10k')
load_dataset('sgp-bench/rev_sit_25k')
load_dataset('sgp-bench/rev_sit_40k')
load_dataset('sgp-bench/rev_sit_55k')
load_dataset('sgp-bench/rev_sit_72k')
load_dataset('sgp-bench/mixed_sit_10k')
load_dataset('sgp-bench/mixed_sit_25k')
load_dataset('sgp-bench/mixed_sit_40k')
load_dataset('sgp-bench/mixed_sit_55k')
load_dataset('sgp-bench/mixed_sit_72k')

We also provide the data as csv files in the following link .

Installation

Run the following command to set up the conda environment.

git clone https://github.com/sgp-bench/sgp-bench.git
cd sgp-bench
conda env create -f environment.yml

Usage

Evaluate closed-sourced model

We provide examples of evaluating OpenAI api and Claude api:

python -m sgp-bench.demo --api $API --model_path $MODEL_PATH --eval $EVAL

# GPT
python -m sgp-bench.demo --api openai-4o

# Claude
python -m sgp-bench.demo --api claude-3.5-sonnet

Evaluate open-sourced model with vLLM

We use vLLM as framework to evaluate open-source LLMs, for more information, please refer to OpenAI Compatible Server .

python -m sgp-bench.demo --base_url $BASE_URL --api $API --model $MODEL --eval $EVAL

# example usage Llama 3 8B
python -m sgp-bench.demo --base_url http://172.22.8.4:8000/v1 --api llama3-8B --model meta-llama/Meta-Llama-3-8B --eval svg cad

Note, the

  • $BASE_URL: the IP address of the server where the open-source LLM is running, you can use the Linux command hostname -i to get the IP address of the remote machine.
  • $API: an identifier of the open-source LLM.
  • $MODEL_PATH: the model path, either the local path where the model is stored or the HuggingFace model name, e.g., meta-llama/Meta-Llama-3-8B.
  • $EVAL: which task to evaluate, possible tasks are svg, cad, inv and mnist. For more information, please refer to the paper.

You have to set up the server at $BASE_URL beforehand. Here we list some examples of setting up the open-source models we evaluated in the paper:

# llama3-8B
python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-8B --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# llama3-70B
python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B-Instruct --dtype auto --api-key token-abc123 --tensor-parallel-size 8



# llama3.1-8B
python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# llama3.1-70B
python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-70B-Instruct --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# llama3.1-405B
python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 --dtype auto --api-key token-abc123 --tensor-parallel-size 8 --max-model-len 8192



# gemma-1.1-2b
python -m vllm.entrypoints.openai.api_server --model google/gemma-1.1-2b-it --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# gemma-1.1-7b
python -m vllm.entrypoints.openai.api_server --model google/gemma-1.1-7b-it --dtype auto --api-key token-abc123 --tensor-parallel-size 8




# mistral-7B-v0.3
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.3 --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# mistral-codestral-22b-v0.1
python -m vllm.entrypoints.openai.api_server --model mistralai/Codestral-22B-v0.1 --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# mistral-large
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-Large-Instruct-2407 --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# mistral-nemo
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-Nemo-Instruct-2407 --dtype auto --api-key token-abc123 --tensor-parallel-size 8



# c4ai-command-r-v01
python -m vllm.entrypoints.openai.api_server --model CohereForAI/c4ai-command-r-v01 --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# c4ai-command-r-plus
python -m vllm.entrypoints.openai.api_server --model CohereForAI/c4ai-command-r-plus --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# CohereForAI/aya-23-8B
python -m vllm.entrypoints.openai.api_server --model CohereForAI/aya-23-8B --dtype auto --api-key token-abc123 --tensor-parallel-size 8 

# CohereForAI/aya-23-35B
python -m vllm.entrypoints.openai.api_server --model CohereForAI/aya-23-35B --dtype auto --api-key token-abc123 --tensor-parallel-size 8



# starcoder2-15b-instruct-v0.1
python -m vllm.entrypoints.openai.api_server --model bigcode/starcoder2-15b-instruct-v0.1 --dtype auto --api-key token-abc123 --tensor-parallel-size 8



# internlm2_5-7b-chat
python -m vllm.entrypoints.openai.api_server --model internlm/internlm2_5-7b-chat --trust-remote-code --dtype auto --api-key token-abc123  --tensor-parallel-size 8

# internlm2-chat-20b
python -m vllm.entrypoints.openai.api_server --model internlm/internlm2-chat-20b --trust-remote-code --dtype auto --api-key token-abc123  --tensor-parallel-size 8

# internlm2-chat-7b
python -m vllm.entrypoints.openai.api_server --model internlm/internlm2-chat-7b --trust-remote-code --dtype auto --api-key token-abc123  --tensor-parallel-size 8



# qwen-1.5-7B
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-7B-Chat --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# qwen-1.5-32B
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-32B-Chat --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# qwen-1.5-72B
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-72B-Chat --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# qwen-1.5-110B
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-110B-Chat --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# codeqwen-1.5-7B
python -m vllm.entrypoints.openai.api_server --model Qwen/CodeQwen1.5-7B-Chat --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# qwen-2-72B
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-72B-Instruct --dtype auto --api-key token-abc123 --tensor-parallel-size 8



# Yi-1.5-9B
python -m vllm.entrypoints.openai.api_server --model 01-ai/Yi-1.5-9B-Chat-16K --dtype auto --api-key token-abc123 --tensor-parallel-size 8

# Yi-1.5-34B
python -m vllm.entrypoints.openai.api_server --model 01-ai/Yi-1.5-34B-Chat-16K --dtype auto --api-key token-abc123 --tensor-parallel-size 8



# DeepSeek-Coder-V2-16B
python -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code  --dtype auto --api-key token-abc123 --tensor-parallel-size 8

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Zeju Qiu - [email protected]

Citation

If you find our project interesting, please cite us ๐Ÿ˜Š

 @article{qiu2024can, 
        title={Can Large Language Models Understand Symbolic Graphics Programs?}, 
        author={Qiu, Zeju and Liu, Weiyang and Feng, Haiwen and Liu, Zhen and Xiao, Tim Z and Collins, Katherine M and Tenenbaum, Joshua B and Weller, Adrian and Black, Michael J and Sch{\"o}lkopf, Bernhard}, 
        journal={arXiv preprint arXiv:2408.08313}, 
        year={2024} 
      }

Acknowledgements

This project is based on the open-source repository available at https://github.com/openai/simple-evals. We are thankful to OpenAI for providing the base implementation.

Popular repositories Loading

  1. sgp-bench sgp-bench Public

    Python 27 3

  2. sgp-bench.github.io sgp-bench.github.io Public

    HTML