Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

[ACM MM 25] Official repo of "UEMM-Air: Enable UAVs to Undertake More Multi-modal Tasks"

Notifications You must be signed in to change notification settings

1e12Leon/UEMM-Air

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 

Repository files navigation

UEMM-Air: Enable UAVs to Undertake More Multi-modal Tasks

Liang Yao (姚亮) Logo,     Fan Liu (刘凡)Logo,     Shengxiang Xu (徐圣翔) Logo,    

Chuanyi Zhang (张传一)Logo,     Shimin Di (邸世民) Logo,     Xing Ma (马幸) Logo,    

Jianyu Jiang (江建谕) Logo,     Zequan Wang (王泽权) Logo,     Jun Zhou (周峻) Logo

Logo         Logo         Logo

Corresponding Author

🤗UEMM-Air     ✈️AirNavigation

📋 Table of Contents

News

  • 2025/8/01: Our paper is accepted by ACM Multimedia 2025 Datasets Track!
  • 2025/1/20: We have open-sourced the dataset generation system, which can be found in the AirNavigation.
  • 2024/12/11: Welcome to UEMM-Air! Dataset is open-sourced at this repository.

Introduction

Fig2

We present a large-scale synthetic drone vision dataset with 6 paired multimodal streams (120k+ sequences) and 4D task versatility , enabling comprehensive research in perception, navigation, and autonomy. Built on Unreal Engine, it offers photorealistic aerial scenarios with precise physics, diverse environmental variations, and pixel-perfect annotations. The paired modalities facilitate cross-modal learning and domain adaptation studies, while the multi-task support (detection, segmentation, retrieval, cross-modality understanding) encourages holistic perception modeling. Its synthetic nature ensures scalability, reproducibility, and rare-event coverage, addressing critical gaps in real-world drone datasets. This work establishes a new benchmark for robust, generalizable vision systems in complex aerial environments.

Download the UEMM-Air 📂

Multi-modality Images

Object Detection

Instance Segmentation

Referring Expression Segmentation

Image-Text Retrieval

Supplementary Materials

Supplementary Materials

License 🚨

This dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC 4.0).

By downloading or using the Dataset, as a Licensee I/we understand, acknowledge, and hereby agree to all the terms of use. This dataset is provided "as is" and without any warranty of any kind, express or implied. The authors and their affiliated institutions are not responsible for any errors or omissions in the dataset, or for the results obtained from the use of the dataset. The dataset is intended for academic research purposes only, and not for any commercial or other purposes. The users of the dataset agree to acknowledge the source of the dataset and cite the relevant papers in any publications or presentations that use the dataset. The users of the dataset also agree to respect the intellectual property rights of the original data owners.

Acknowledge

  • Thanks Yanwen Ding(丁彦文)for her efforts in Image-Text Retrieval annotations.

Citation🎈

@misc{yao2025uemmair,
      title={UEMM-Air: Make Unmanned Aerial Vehicles Perform More Multi-modal Tasks}, 
      author={Liang Yao and Fan Liu and Shengxiang Xu and Chuanyi Zhang and Xing Ma and Jianyu Jiang and Zequan Wang and Shimin Di and Jun Zhou},
      year={2025},
      eprint={2406.06230},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2406.06230}, 
}

Contact ✉

Please Contact [email protected].

About

[ACM MM 25] Official repo of "UEMM-Air: Enable UAVs to Undertake More Multi-modal Tasks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published