🏠Home Page • 🤗 Dataset
conda create -n "vtikz" python=3.12.7
pip install -r requirements.txtTo execute the evaluation for any subset, use the following script:
python3 -m vtikz.run_evaluation [-h] --subsets SUBSETS [SUBSETS ...] --metrics METRICS [METRICS ...] --agent AGENT --model MODEL [--vlm VLM] [--vlm_api_url VLM_API_URL] [--vlm_api_key VLM_API_KEY] [--vlm-temperature VLM_TEMPERATURE] [--interaction-amount INTERACTION_AMOUNT] [--run-model] [--api_url API_URL] [--api_key API_KEY] [--temperature TEMPERATURE] [--passk PASSK]Important
Agents are separated into two categories, internal and external, external agents can only be configured using the config-vtikz.cfg file. see the External Agent section.
--subsets,-s: Subset(s) to evaluate the model on. Defaults to["tikz"], futures subset could be "svg, p5js, pygame, etc".--metrics,-me: List of metrics for evaluation. Defaults to["Template","ImageEquality","line","crystalBleuPatch",].--agent,-a: Name of the agent to use. Required. Choices:["simpleLLM", "simpleLMM", "loopVLMLLM", "loopLMM", "FAR", "VIF"].--model,-m: Name of the model to evaluate. Required.--run-model,-r: Launch the model locally for evaluation. If used with--api_url,api_url, it is ignored.
--api_url: URL of the OpenAI-compatible API.--api_key: API key for authentication. Defaults to the environment variableOPENAI_API_KEY.--temperature: Sampling temperature for the model. Defaults to0.7.--passk: Number of responses per prompt for computing pass@k. Defaults to1.
--vlm,-v: Name of the VLM to use.--vlm_api_url: URL of the OpenAI-compatible API for the VLM.--vlm_api_key: API key for VLM authentication. Defaults to the environment variableOPENAI_API_KEY.--vlm-temperature: Sampling temperature for the VLM. Defaults to0.--interaction-amount: Number of interactions between the LLM and VLM. Defaults to2.
- Results are saved in the
./resultsdirectory under subfolders based on the model name. Each subset generates:- A
.jsonfile summarizing evaluation scores. - Evaluation datasets stored for further analysis or sharing.
- A
All results are published to https://huggingface.co/datasets/CharlyR/vtikz-evaluation for now, access is needed to publish new results. The datasets are analysed using this notebooks
Some additional configurations can be set up by adding a file config-vtikz.cfg at the root of the project(An example of a configuration is provided here).
Each section is described below.
Parameters for the VIF-Agent. See the documentation here.
| Parameter | Description | Example Value |
|---|---|---|
api_url |
Base URL for the main model API. | https://api.openai.com/v1 |
api_key |
API key for authenticating requests to the main model. | your_api_key |
model |
Name of the main model to use. | gpt-4 |
search_api_url |
Base URL for the search model API (if used). | https://api.openai.com/v1 |
search_api_key |
API key for authenticating requests to the search model. | your_search_api_key |
search_model |
Name of the search model to use (if applicable). | gpt-4-turbo |
identification_api_url |
Base URL for the identification model API (if used). | https://api.openai.com/v1 |
identification_api_key |
API key for authenticating requests to the identification model. | your_identification_api_key |
identification_model |
Name of the identification model to use (if applicable). | gpt-3.5-turbo |
temperature |
Sampling temperature for model inference (0 = deterministic). | 0.0 |
If you want to run the model locally, all the parameters can be set in this section, for example trust-remote-code = True.
See this documentation for reference.
The clip model for the clip metric is launched using the open_clip library.
model_name (e.g., ViT-bigG-14-quickgelu),pretrained_name(e.g., metaclip_fullcc) and force_cpu parameters can be set.
General settings.
- cache_enabled: Enables caching (
TrueorFalse). - cache_location: Cache directory (e.g.,
.cache).
API configuration.
- seed: Random seed for reproducibility (e.g.,
456789, will be used for the cache, and for the OpenAI-compaptible apis that support it).
Settings for the code correction agent.
- max_iteration: Max correction attempts (e.g.,
5).
Renderer settings for visual output.
- p5js_browser_path: Path to the browser for rendering the p5.js sketches.
-
With the OpenAI API:
python3 -m vtikz.run_evaluation --subsets tikz svg --model gpt-3.5-turbo --api_key YOUR_API_KEY --agent simpleLLM
-
With Another OpenAI-Compatible API:
python3 -m vtikz.run_evaluation --subsets tikz --model llama-3.1-70b-versatile --api_url https://api.groq.com/openai/v1 --api_key $GROQ_API_KEY --temperature 0.7 --passk 5 --agent simpleLLM
- Using a Locally Launched Model:
python3 -m vtikz.run_evaluation --subsets tikz --model meta-llama/Llama-3.2-1B-Instruct --run-model --temperature 0.9 --passk 3 --agent simpleLLM
-
With a VLLM API at runpod:
python3 -m vtikz.run_evaluation --subsets svg --model meta-llama/Llama-3.2-1B-Instruct --api_url https://api.runpod.ai/YOURAPI/openai/v1 --api_key $RUNPOD_API_KEY --vlm llava-hf/llava-1.5-7b-hf --vlm_api_url https://api.runpod.ai/YOURAPI/openai/v1 --vlm_api_key $VLM_RUNPOD_API_KEY --interaction-amount 2 --agent loopVLMLLM
-
With a Groq api:
python3 -m vtikz.run_evaluation --subsets tikz --model llama-3.1-70b-versatile --api_url https://api.groq.com/openai/v1 --api_key $GROQ_API_KEY --vlm llava-v1.5-7b-4096-preview --vlm_api_url https://api.groq.com/openai/v1 --vlm_api_key $GROQ_API_KEY --interaction-amount 1 --agent loopVLMLLM --passk 1
- Ensure required environment variables (e.g.,
OPENAI_API_KEYorHF_TOKEN) are set if not explicitly passed as arguments. - For local models, ensure compatibility with the
vllmframework. - The script dynamically creates directories for saving results if they do not exist.
The dataset is created from the scripts situated in vtikz/dataset_workflow
- Each folder in the dataset is a subset.
- Each subset contains a list of entries in the dataset in the form of a folder.
- These folders contain an input and a folder solutions.
- Each entry in the dataset has a instruction as well.
You can run the following command:
python3 -m vtikz.dataset_workflow.create_dataset [-h] --dataset DATASETThe script will first compute the patches for each entry, then will add, commit, and push the changes to get a commit id and create the dataset with the instruction, the repo, the id of the commit, and the patch.
WIP: fix the script and Add documentation for the synthetic data generation
- Environment Variables: If
--api_keyis not provided, the script usesOPENAI_API_KEY.