VELM (Vision Expert + Language Model): A framework for Anomaly Classification (AC)

Repository provides the source code for the paper "Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models":

arXiv | Video | Poster

Overview

The project supports multiple LLM backends including GPT-4o, GPT-4o-mini, and Qwen2-VL. VELM enables anomaly classification by leveraging the visual understanding capabilities of multimodal LLMs. The system can:

Process images of industrial objects
Detect and localize anomalies using a vision expert
Generate red line contour based on anomaly localization
Classify different types of anomaly
Distinguish between negligible anomalies and critical defects
Evaluate model performance using various metrics

Repository Structure

VELM/
├── configs/                  # Configuration files
│   ├── prompts/              # Preprocessed prompts for different datasets
│   ├── predictions/          # Model predictions
|   ├── evaluations/          # Evaluation results
│   ├── mvtec_ad_des.json     # MVTec-AD dataset descriptions
│   ├── mvtec_ac_des.json     # MVTec-AC dataset descriptions
│   └── visa_ac_des.json      # VisA-AC dataset descriptions
├── datasets/                 # Datasets (download links are provided)
│   ├── mvtec_ad              # MVTec-AD dataset
│   ├── mvtec_ac              # MVTec-AC dataset
│   └── visa_ac               # VisA-AC dataset
├── utils.py                  # Common utility functions
├── ddad_reorganizer.py       # Match the output of DDAD to the expected directory structure
├── create_contour.py         # Script to draw contour lines on the query image based on detected anomalies
├── generate_prompts.py       # Script to preprocess prompts
├── run_llm_hm.py             # Script to run LLM models with heatmap visualization
├── eval.py                   # Script to evaluate model predictions
└── anomaly_vs_defect.py      # Script to evaluate negligible anomaly vs. critical defect classification

Installation

Clone the repository:

git clone https://github.com/Sassanmtr/VELM.git
cd VELM

Install dependencies:

conda create --name velm_env python=3.9
conda activate velm_env
pip install -r requirements.txt

(For experiments with GPT) Set up environment variables: Create a .env file in the root directory with your API keys:

OPENAI_API_KEY=your_openai_api_key

Datasets

The framework supports the following datasets:

MVTec-AD: A dataset for unsupervised anomaly detection
MVTec-AC: A dataset for anomaly classification
VisA-AC: A dataset for anomaly classification

Download and Setup Instructions

MVTec-AD Download and place in the datasets/mvtec_ad folder
MVTec-AC Download and place in the datasets/mvtec_ac folder

Note: MVTec-AC uses the same training set as MVTec-AD. You can copy the train folder from mvtec_ad to mvtec_ac if needed

VisA-AC Download and place in the datasets/visa_ac folder

Note: VisA-AC uses the same training set as VisA. If VisA is already downloaded, you can reuse its train folder.

Usage

Create Contour Images

Generate red contour lines from heatmaps to overlay on test images:

python create_contour.py --dataset mvtec_ac --image_size 448

Options:

--config: Path to YAML configuration file (default: configs/contour_config.yaml)
--dataset: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)-overrides config
image_size: Resize inout images and heatmaps to this size (default: 448)

Note: To use heatmaps generated by other methods, update the heatmap_dir path in the configuration file accordingly

Generating Prompts

Generate prompts for the LLM to perform anomaly classification:

python generate_prompts.py --dataset mvtec_ac --text_type conditioned --ddad_format True

Options:

--dataset: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)
--text_type: Type of text to generate (raw: reference and query images, conditioned: reference, contour, and query images)
--ddad_format: Whether to use DDAD format (True/False)

Running LLM Models

Run LLM models for anomaly detection with heatmap visualization:

# Using GPT-4o (default)
python run_llm.py --model gpt --dataset mvtec_ad --heatmap_mode contour

# Using GPT-4o-mini
python run_llm.py --model gpt --gpt_model gpt-4o-mini --dataset mvtec_ad --heatmap_mode contour

# Using Qwen2-VL
python run_llm.py --model qwen --dataset mvtec_ac --heatmap_mode contour

Options:

--model: Model to use (gpt, qwen)
--gpt_model: GPT model to use (gpt-4o, gpt-4o-mini) - only applicable if model=gpt
--dataset: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)
--heatmap_mode: Heatmap visualization mode (contour, none)
--image_size: Size to resize images to (default: 448)
--num_ref: Number of reference images to use (default: 1)

Evaluating Model Performance

Evaluate model predictions:

python eval.py --dataset mvtec_ad --model gpt-4o --heatmap_mode contour

Options:

--dataset: Dataset to evaluate (mvtec_ad, mvtec_ac, visa_ac)
--model: Model type used for predictions (gpt-4o, gpt-4o-mini, qwen)
--heatmap_mode: Heatmap visualization mode (contour, none)
--output: Path to save evaluation results (optional)
--verbose: Enable verbose logging

Anomaly vs. Defect Classification

Evaluate model performance in distinguishing between critical defects and negligible anomalies:

python anomaly_vs_defect.py --dataset mvtec_ad --model gpt-4o --heatmap_mode contour

Options:

--dataset: Dataset to evaluate (mvtec_ad, mvtec_ac, visa_ac)
--model: Model type used for predictions (gpt-4o, gpt-4o-mini, qwen)
--heatmap_mode: Heatmap visualization mode (contour, none)
--seeds: Number of random seeds to use for evaluation (default: 5)
--output: Path to save evaluation results (default: anom_def_results.json)
--verbose: Enable verbose logging

Results

Evaluation results are saved in JSON format and include:

Accuracy per object category
Standard deviation of accuracy
Overall accuracy metrics
Confusion matrices

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

@article{mokhtar2025detect,
  title={Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models},
  author={Mokhtar, Sassan and Mousakhan, Arian and Galesso, Silvio and Tayyub, Jawad and Brox, Thomas},
  journal={arXiv preprint arXiv:2505.02626},
  year={2025}
}

Feedback

For any feedback or inquiries, please contact [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VELM (Vision Expert + Language Model): A framework for Anomaly Classification (AC)

arXiv | Video | Poster

Overview

Repository Structure

Installation

Datasets

Download and Setup Instructions

Usage

Create Contour Images

Generating Prompts

Running LLM Models

Evaluating Model Performance

Anomaly vs. Defect Classification

Results

License

Citation

Feedback

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anomaly_vs_defect.py		anomaly_vs_defect.py
create_contour.py		create_contour.py
ddad_reorganizer.py		ddad_reorganizer.py
eval.py		eval.py
generate_prompts.py		generate_prompts.py
requirements.txt		requirements.txt
run_llm.py		run_llm.py
utils.py		utils.py

License

Sassanmtr/VELM

Folders and files

Latest commit

History

Repository files navigation

VELM (Vision Expert + Language Model): A framework for Anomaly Classification (AC)

arXiv | Video | Poster

Overview

Repository Structure

Installation

Datasets

Download and Setup Instructions

Usage

Create Contour Images

Generating Prompts

Running LLM Models

Evaluating Model Performance

Anomaly vs. Defect Classification

Results

License

Citation

Feedback

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages