A real-time video surveillance system that uses multimodal AI models to detect anomalies (violence and falling-down incidents) in video clips and provides an interactive web interface for analysis and search.
The project consists of four main components:
Calls surveillance.py and search.py.
You can switch to a different model by changing:
from utils.chat_gemini import generate_messages, get_response
Key Function:
streaming_process_video(clip, output_path, prompt): Processes a single clip and appends results to the output JSONL file. You can try different prompts here.
Searches processed video descriptions and returns the relevant clip ID and answer.
You can choose from four LLMs to process videos:
chat_qwen.py: Uses an SFT modelM3-Memorization.chat_qwen_omni.py: Uses Qwen2.5-Omni-7B.chat_gpt.py: Uses gpt-4o-mini (you can change it to other GPT models).chat_gemini.py: Uses gemini-2.5-flash.
Required Python packages (install via pip):
pip install gradio openai transformers torchFor Qwen-Omni models (refer to https://github.com/QwenLM/Qwen2.5-Omni):
pip install -r requirements_web_demo.txtThe system can use the Qwen2.5-Omni-7B model for video analysis. Download and set up the model from https://huggingface.co/Qwen/Qwen2.5-Omni-7B.
The model should be located at models/Qwen2.5-Omni-7B/ with the following structure:
models/Qwen2.5-Omni-7B/
├── config.json
├── generation_config.json
├── model-*.safetensors
└── ...
Note: If you're using other models (GPT, Gemini, or the SFT model), you don't need to download Qwen2.5-Omni-7B. Simply configure the API keys or model paths in the respective utility files.
Create the following directory structure:
data/
├── videos/ # Original video files
├── clips/ # 15-second video clips (organized by video ID)
│ ├── gym/
│ ├── warehouse/
│ └── ...
└── results/ # Analysis results (JSONL format)
Create data/data.jsonl with the following format (one line per video):
{"id": "gym", "video_path": "data/videos/gym.mp4", "clip_path": "data/clips/gym/", "output_path": "data/results/gym.json"}
{"id": "warehouse", "video_path": "data/videos/warehouse.mp4", "clip_path": "data/clips/warehouse/", "output_path": "data/results/warehouse.json"}Fields:
id: Unique identifier for the videovideo_path: Path to the full video fileclip_path: Directory containing 15-second clip files (named as0.mp4,1.mp4, etc.)output_path: Path where analysis results will be saved (JSONL format)
Cut videos into 15-second segments.
#!/bin/bash
video="gym"
input="data/videos/$video.mp4"
mkdir -p "data/clips/$video"
duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input")
duration_seconds=$(echo "$duration" | awk '{print int($1)}')
segments=$((duration_seconds / 15 + 1))
for ((i=0; i<segments; i++)); do
start=$((i * 15))
end=$(((i + 1) * 15))
output="data/clips/$video/$i.mp4"
ffmpeg -ss $start -i "$input" -t 15 -c copy "${output}"
doneThe search functionality uses an OpenAI-compatible API. Configure your API key in search.py:
client = OpenAI(
api_key = "your-api-key-here",
base_url = ""
)python -m surveillanceModify the __main__ block in surveillance.py to specify:
clip_path: Directory containing video clipsoutput_path: Output JSONL file path- Detection prompt (violence or falling)
Results are saved in JSONL format (one JSON object per line):
{"clip_id": 0, "response": ["WARNING: ...", "Sentence 1", "Sentence 2", ...]}
{"clip_id": 1, "response": ["Sentence 1", "Sentence 2", ...]}clip_id: Numeric ID of the clip (derived from filename)response: List of sentences describing the clip, with optional "WARNING:" prefix for anomalies
VideoSurv/
├── gradio_app.py # Main web interface
├── surveillance.py # Core video processing
├── search.py # Video search functionality
├── utils/
│ ├── chat_qwen_omni.py # Qwen2.5-Omni integration
│ ├── prompts.py # Detection prompts
│ ├── general.py # Utility functions
│ └── ... # Other utilities
├── data/
│ ├── data.jsonl # Video metadata
│ ├── videos/ # Original videos
│ ├── clips/ # Video clips
│ └── results/ # Analysis results
└── models/
└── Qwen2.5-Omni-7B/ # AI model files