Repository provides the source code for the paper "Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models":
The project supports multiple LLM backends including GPT-4o, GPT-4o-mini, and Qwen2-VL. VELM enables anomaly classification by leveraging the visual understanding capabilities of multimodal LLMs. The system can:
- Process images of industrial objects
- Detect and localize anomalies using a vision expert
- Generate red line contour based on anomaly localization
- Classify different types of anomaly
- Distinguish between negligible anomalies and critical defects
- Evaluate model performance using various metrics
VELM/
├── configs/ # Configuration files
│ ├── prompts/ # Preprocessed prompts for different datasets
│ ├── predictions/ # Model predictions
| ├── evaluations/ # Evaluation results
│ ├── mvtec_ad_des.json # MVTec-AD dataset descriptions
│ ├── mvtec_ac_des.json # MVTec-AC dataset descriptions
│ └── visa_ac_des.json # VisA-AC dataset descriptions
├── datasets/ # Datasets (download links are provided)
│ ├── mvtec_ad # MVTec-AD dataset
│ ├── mvtec_ac # MVTec-AC dataset
│ └── visa_ac # VisA-AC dataset
├── utils.py # Common utility functions
├── ddad_reorganizer.py # Match the output of DDAD to the expected directory structure
├── create_contour.py # Script to draw contour lines on the query image based on detected anomalies
├── generate_prompts.py # Script to preprocess prompts
├── run_llm_hm.py # Script to run LLM models with heatmap visualization
├── eval.py # Script to evaluate model predictions
└── anomaly_vs_defect.py # Script to evaluate negligible anomaly vs. critical defect classification
- Clone the repository:
git clone https://github.com/Sassanmtr/VELM.git
cd VELM- Install dependencies:
conda create --name velm_env python=3.9
conda activate velm_env
pip install -r requirements.txt- (For experiments with GPT) Set up environment variables:
Create a
.envfile in the root directory with your API keys:
OPENAI_API_KEY=your_openai_api_key
The framework supports the following datasets:
- MVTec-AD: A dataset for unsupervised anomaly detection
- MVTec-AC: A dataset for anomaly classification
- VisA-AC: A dataset for anomaly classification
- MVTec-AD Download and place in the
datasets/mvtec_adfolder - MVTec-AC Download and place in the
datasets/mvtec_acfolder
Note: MVTec-AC uses the same training set as MVTec-AD. You can copy the train folder from mvtec_ad to mvtec_ac if needed
- VisA-AC Download and place in the
datasets/visa_acfolder
Note: VisA-AC uses the same training set as VisA. If VisA is already downloaded, you can reuse its train folder.
Generate red contour lines from heatmaps to overlay on test images:
python create_contour.py --dataset mvtec_ac --image_size 448Options:
--config: Path to YAML configuration file (default:configs/contour_config.yaml)--dataset: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)-overrides configimage_size: Resize inout images and heatmaps to this size (default: 448)
Note: To use heatmaps generated by other methods, update the heatmap_dir path in the configuration file accordingly
Generate prompts for the LLM to perform anomaly classification:
python generate_prompts.py --dataset mvtec_ac --text_type conditioned --ddad_format TrueOptions:
--dataset: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)--text_type: Type of text to generate (raw: reference and query images, conditioned: reference, contour, and query images)--ddad_format: Whether to use DDAD format (True/False)
Run LLM models for anomaly detection with heatmap visualization:
# Using GPT-4o (default)
python run_llm.py --model gpt --dataset mvtec_ad --heatmap_mode contour
# Using GPT-4o-mini
python run_llm.py --model gpt --gpt_model gpt-4o-mini --dataset mvtec_ad --heatmap_mode contour
# Using Qwen2-VL
python run_llm.py --model qwen --dataset mvtec_ac --heatmap_mode contourOptions:
--model: Model to use (gpt, qwen)--gpt_model: GPT model to use (gpt-4o, gpt-4o-mini) - only applicable if model=gpt--dataset: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)--heatmap_mode: Heatmap visualization mode (contour, none)--image_size: Size to resize images to (default: 448)--num_ref: Number of reference images to use (default: 1)
Evaluate model predictions:
python eval.py --dataset mvtec_ad --model gpt-4o --heatmap_mode contourOptions:
--dataset: Dataset to evaluate (mvtec_ad, mvtec_ac, visa_ac)--model: Model type used for predictions (gpt-4o, gpt-4o-mini, qwen)--heatmap_mode: Heatmap visualization mode (contour, none)--output: Path to save evaluation results (optional)--verbose: Enable verbose logging
Evaluate model performance in distinguishing between critical defects and negligible anomalies:
python anomaly_vs_defect.py --dataset mvtec_ad --model gpt-4o --heatmap_mode contourOptions:
--dataset: Dataset to evaluate (mvtec_ad, mvtec_ac, visa_ac)--model: Model type used for predictions (gpt-4o, gpt-4o-mini, qwen)--heatmap_mode: Heatmap visualization mode (contour, none)--seeds: Number of random seeds to use for evaluation (default: 5)--output: Path to save evaluation results (default: anom_def_results.json)--verbose: Enable verbose logging
Evaluation results are saved in JSON format and include:
- Accuracy per object category
- Standard deviation of accuracy
- Overall accuracy metrics
- Confusion matrices
This project is licensed under the MIT License - see the LICENSE file for details.
@article{mokhtar2025detect,
title={Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models},
author={Mokhtar, Sassan and Mousakhan, Arian and Galesso, Silvio and Tayyub, Jawad and Brox, Thomas},
journal={arXiv preprint arXiv:2505.02626},
year={2025}
}
For any feedback or inquiries, please contact [email protected]