Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

License

Notifications You must be signed in to change notification settings

suhaisheng/UniMamba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UniMamba

Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

Xin Jin♠️,2,3, Haisheng Su♠️,1,3 📧, Kai Liu3, Cong Ma3, Wei Wu3,4, Fei HUI3 📧, Junchi Yan1 📧

1 School of Computer Science, Shanghai Jiao Tong University

2 Chang'an University, 3 SenseAuto Research, 4 Tsinghua University

♠️ Equal Contributions, 📧 Corresponding authors

Paper

News

  • Mar. 9th, 2025: We released our paper on Arxiv. Code/Models are coming soon. Please stay tuned! ☕️
  • Mar. 9th, 2025: Our paper has been accepted to CVPR 2025!

Table of Contents

Introduction

Recent advances in LiDAR 3D detection have demonstrated the effectiveness of Transformer-based frameworks in capturing the global dependencies from point cloud spaces, which serialize the 3D voxels into the flattened 1D sequence for iterative self-attention. However, the spatial structure of 3D voxels will be inevitably destroyed during the serialization process. Besides, due to the considerable number of 3D voxels and quadratic complexity of Transformers, multiple sequences are grouped before feeding to Transformers, leading to a limited receptive field. Inspired by the impressive performance of State Space Models (SSM), in this paper, we propose a novel Unified Mamba (UniMamba), which seamlessly integrates the merits of 3D convolution and SSM in a concise multi-head manner, aiming to perform "local and global" spatial context aggregation efficiently and simultaneously. Specifically, a UniMamba block is designed which mainly consists of spatial locality modeling, complementary Z-order serialization and local-global sequential aggregator. The spatial locality modeling module integrates 3D submanifold convolution to capture the dynamic spatial position embedding before serialization. Then the efficient Z-order curve is adopted for serialization both horizontally and vertically. Furthermore, the local-global sequential aggregator adopts the channel grouping strategy to efficiently encode both ``local and global" spatial inter-dependencies using multi-head SSM. Additionally, an encoder-decoder architecture with stacked UniMamba blocks is formed to facilitate multi-scale spatial learning hierarchically. Extensive experiments are conducted on three popular datasets: nuScenes, Waymo and Argoverse 2. Particularly, our UniMamba achieves 70.2 mAP on the nuScenes dataset.

Framework

Evaluation on nuScenes dataset

Evaluation on Waymo Open dataset

Evaluation on Argoverse2 dataset

Getting Started

TBD

License

This project is released under the MIT license

Contact

If you have any questions, please contact Haisheng Su via email ([email protected]).

Citation

If you find UniMamba is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@article{su2025unimamba,
  title={UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection},
  author={Jin, Xin and Su, Haisheng and Liu, Kai and Ma, Cong and Wu, Wei and Hui, Fei and Yan, Junchi},
  journal={arXiv preprint arXiv:2503.12009},
  year={2025}
}

About

Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published