RAVLLM: Driving Like Humans - Vision Large Language Models for Road Anomaly Detection

Overview

This repository presents the implementation of RAVLLM, a novel framework that leverages Vision Large Language Models (VLLMs) for road anomaly detection in autonomous driving contexts. Our approach demonstrates how modern vision-language models can be effectively adapted to identify and reason about road anomalies through human-like perception and reasoning patterns.

The framework addresses the critical challenge of detecting road anomalies that may not conform to predefined categories, utilizing the emergent reasoning capabilities of VLLMs to achieve robust performance across diverse driving scenarios. Through careful prompt engineering and model adaptation, RAVLLM provides both accurate anomaly detection and interpretable reasoning processes that mirror human decision-making patterns.

Repository Structure

RAVLLM/
├── DataSetConverters/          # Dataset format conversion utilities
│   ├── florence2_converter.py  # Florence-2 format conversion
│   └── palligemma_converter.py # PaliGemma format conversion
├── dataloaders/               # Dataset loading and preprocessing
│   ├── anomaly_loader.py      # Road anomaly dataset loader
│   └── preprocessing.py       # Data preprocessing utilities
├── samples/                   # Example inputs and outputs
│   ├── input_examples/        # Sample road images
│   └── output_examples/       # Model prediction samples
├── scripts/                   # Utility and helper scripts
│   ├── evaluation.py          # Model evaluation utilities
│   └── visualization.py       # Result visualization tools
├── TrainFlorence2OD.OK.py     # Main training script for Florence-2
├── requirements.txt           # Project dependencies
└── README.md                  # Project documentation

Installation and Setup

Prerequisites

Ensure you have Python 3.8 or higher installed on your system.

Installation Steps

Clone the repository:

git clone https://github.com/abdkhanstd/RAVLLM.git
cd RAVLLM

Install dependencies:
```
pip install -r requirements.txt
```

Verify installation:

python -c "import torch; print('PyTorch version:', torch.__version__)"

Usage Guide

Training the Model

To begin training the RAVLLM framework:

python TrainFlorence2OD.OK.py

The training script supports various configuration options that can be modified within the script or passed as command-line arguments.

Dataset Preparation

The framework supports multiple dataset formats through the provided conversion utilities:

For Florence-2 format:

python DataSetConverters/florence2_converter.py --input_dir /path/to/raw/data --output_dir /path/to/converted/data

For PaliGemma format:

python DataSetConverters/palligemma_converter.py --input_dir /path/to/raw/data --output_dir /path/to/converted/data

Evaluation and Testing

Evaluate trained models using the provided evaluation scripts:

python scripts/evaluation.py --model_path /path/to/trained/model --test_data /path/to/test/data

Data and Model Weights

The model weights and related datasets can be accessed from:

Weights: Pre-trained RAVLLM models for road anomaly detection
Datasets: Curated road anomaly datasets with annotations

Dataset Format

The framework expects input datasets to follow the standardized format for road anomaly detection tasks. Sample data structures and formatting guidelines are provided in the samples/ directory.

Technical Implementation

Supported Vision Language Models

The current implementation provides support for:

Florence-2: Microsoft's vision-language model optimized for object detection tasks
PaliGemma: Google's efficient vision-language model for multimodal understanding

Model Architecture

The RAVLLM framework implements a multi-modal approach that combines visual feature extraction with language-based reasoning. The architecture leverages pre-trained vision-language models while incorporating domain-specific adaptations for road anomaly detection scenarios.

Experimental Results

Our experimental validation demonstrates that RAVLLM achieves competitive performance on standard road anomaly detection benchmarks while providing interpretable reasoning outputs that facilitate understanding of model decisions. Detailed results and analysis will be included in the full paper publication.

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{shafiq2024driving,
  title={Driving Like Humans: Leveraging Vision Large Language Models for Road Anomaly Detection},
  author={Shafiq, Sidra and Awan, Hassan Moatasam and Khan, Abdullah Aman and Amin, Waqas},
  booktitle={2024 3rd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE)},
  pages={1--6},
  year={2024},
  organization={IEEE}
}

Contributing

We welcome contributions to improve RAVLLM. Please feel free to submit issues and enhancement requests through the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions and collaborations, please reach out to the corresponding authors through the institutional channels or create an issue in this repository.

Note: This repository represents ongoing research in vision-language models for autonomous driving applications. We encourage researchers to build upon this work and contribute to advancing the field of interpretable road anomaly detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAVLLM: Driving Like Humans - Vision Large Language Models for Road Anomaly Detection

Overview

Repository Structure

Installation and Setup

Prerequisites

Installation Steps

Usage Guide

Training the Model

Dataset Preparation

Evaluation and Testing

Data and Model Weights

Dataset Format

Technical Implementation

Supported Vision Language Models

Model Architecture

Experimental Results

Citation

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DataSetConverters		DataSetConverters
dataloaders		dataloaders
samples		samples
scripts		scripts
ConvertLFFL.py		ConvertLFFL.py
README.md		README.md
TrainFlorence2OD.OK.py		TrainFlorence2OD.OK.py
requirements.txt		requirements.txt

abdkhanstd/RAVLLM

Folders and files

Latest commit

History

Repository files navigation

RAVLLM: Driving Like Humans - Vision Large Language Models for Road Anomaly Detection

Overview

Repository Structure

Installation and Setup

Prerequisites

Installation Steps

Usage Guide

Training the Model

Dataset Preparation

Evaluation and Testing

Data and Model Weights

Dataset Format

Technical Implementation

Supported Vision Language Models

Model Architecture

Experimental Results

Citation

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages