This repository contains the official implementation described in the FUSION 2026 paper:
A Dual-UAV Data Fusion Pipeline for Detection and Localization of Armed Individuals from Video
It is a multi-UAV data fusion extension of the repository:
which was developed for the LAFUSION 2025 paper:
An Ensemble Data Fusion Pipeline for Armed Individual Detection and Localization from UAV Video.
If you use this code or dataset, please cite the relevant papers in your work.
Before running the pipeline, you must download the required YOLO models for people and weapon detection. Place them in the following directories:
models/people/yolo11n.pt(default people detection model)
models/weapons/best.pt(default weapon detection model)
You can obtain these models from their respective sources (YOLO official releases or custom training). Ensure the file names and locations match the structure above.
We constructed the Zenodo dataset named: Dual-UAV Dataset for Detection and Localization of Armed Individuals. Download the dataset from Zenodo:
Depending on the structure of the downloaded dataset:
- If the dataset is in raw format (e.g., raw mp4 files), run:
python scripts/organize_raw_files.py
- If you need to revert the organization, run:
python scripts/revert_organized_files.py
To preprocess raw videos into clips and frame samples, use:
python scripts/preprocess_videos.pyMain options:
--raw: Directory with raw videos (default: inputs/raw)--clips: Output directory for video clips (default: inputs/clips)--samples: Output directory for frame samples (default: inputs/samples)-X,--clip-duration: Duration of each clip in seconds (default: 10)-Z,--resolution: Target resolution (default: 1080p)-W,--frame-interval: Extract 1 frame every W frames (default: 10)--clips-only: Skip raw video processing, only extract frames from existing clips
Example:
python scripts/preprocess_videos.py --raw inputs/raw --clips inputs/clips --samples inputs/samples -X 10 -Z 1080p -W 10Outputs:
- Processed video clips in
inputs/clips/ - Frame sample directories in
inputs/samples/
To run detection and analysis on the preprocessed samples:
python src/main.pyMain options:
--model: Path to YOLO model file for people detection (default: models/people/yolo11n.pt)--input: Input directory containing sample folders (default: inputs/samples)--output: Output directory for processed images (default: output/detections)--person-confidence: Confidence threshold for person detections (default: 0.5)--weapon-confidence: Confidence threshold for weapon detections (default: 0.5)--dual-drone: Enable dual-drone mode (requires--input-drone1and--input-drone2)--angle: Process only a single angle subfolder (e.g., 90)
Example:
python src/main.py --input inputs/samples --output output/detectionsOutputs:
- Detection results and overlays in
output/detections/ - Logs in
logs/ - Statistics summary printed to console and saved to log files
After running the pipeline, generate summary tables:
- Detection table:
python detection_tables.py
- Localization table:
python localization_tables.py
Outputs:
- LaTeX tables in
results/detection_tables.texandresults/localization_tables.tex
The scripts/ folder contains additional helper scripts:
- extract_plot.py: Extracts and generates plots from processed data, useful for visualizing detection or localization results.
- extract_srt.py: Extracts subtitle (SRT) information from video or detection outputs, for annotation or review purposes.
- extract_view.py: Generates view images or overlays from processed samples, useful for qualitative analysis or presentation.
Each script can be run with python scripts/<script_name>.py and may have its own command-line options. Refer to the comments in each file for specific usage details.
The src/ folder contains the main implementation of the detection and analysis pipeline. Here is an overview of its structure and functionality:
- main.py: Entry point for running the detection pipeline. Handles argument parsing, logging, and orchestrates the processing of samples using the pipeline modules.
- camera.py: Utilities and classes for camera geometry and parameters, used for localization and projection tasks.
- geoconverter.py: Functions for converting between coordinate systems (e.g., pixel, local, global), essential for localization and fusion.
- plots.py: Code for generating plots and visualizations of detection and localization results.
- position_estimation.py: Implements algorithms for estimating object positions from detections and camera data.
- stats.py: Collects and summarizes statistics from detection runs, including counts, confidence scores, and performance metrics.
- viewer.py: Provides visualization tools for viewing detection results and overlays.
- single_pipeline.py: Implements the detection pipeline for single-drone scenarios, including people and optional weapon detection.
- dual_pipeline.py: Implements the pipeline for dual-drone fusion, associating detections across drones and performing localization.
- people_detector.py: Contains the logic for running YOLO-based people detection on images or frames.
- weapon_detector.py: Contains the logic for running YOLO-based weapon detection on cropped person images.
- detection_fusion.py: Algorithms for fusing detections from multiple sources (e.g., dual-drone mode).
All modules are designed to be modular and reusable, allowing for flexible configuration and extension. The pipeline supports both single and dual drone setups, with options for detection, localization, statistics, and visualization.
For more details on each script, see the comments in the respective files.