Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Efficient and lightweight Vision-Language model for Visual Question Answering in autonomous driving scenarios. The approach replaces images in BLIP's architecture with spatio-temporal BEV feature maps

Notifications You must be signed in to change notification settings

BaranEkin/BEVBlip

Repository files navigation

Visual Question Answering for Traffic Environment Understanding

BEVBlip is an efficient and lightweight Vision-Language Model (VLM) based on BLIP architecture, trained for comprehensive Visual Question Answering (VQA) task introduced by DriveLM on nuScenes dataset. As the core idea, BEVBlip employs spatio-temporal Bird’s Eye View (BEV) maps acquired via BEVFormer as visual features and integrates visual and language features for enhanced traffic environment understanding. In order to align BEV features with language, a pre-training stage utilizing GPT generated data is executed.

Example Results

image image

Implementation

High-level outline of the proposed approach:

image

Pre-training

The architecture of the pre-training model:

image
The bottom section illustrates offline data generation steps using BEVFormer and GPT-3.5. The upper right section shows the unified multimodal encoder-decoder with pretrained weights from BLIP. The upper left section depicts the compact vision transformer architecture, trained from scratch with BEV feature maps.

Visual Question Answering

The architecture of the VQA model used for the fine-tuning on DriveLM task:

image
Left section shows the vision transformer, initialized with the weights from the pre-training stage. Right section illustrates the reconfiguration of text encoder and text decoder as question encoder and answer decoder respectively.

Acknowledgement

Sources and references:

About

Efficient and lightweight Vision-Language model for Visual Question Answering in autonomous driving scenarios. The approach replaces images in BLIP's architecture with spatio-temporal BEV feature maps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published