Alibaba AMAP CV Lab

👋 About Us

We are the Alibaba AMAP CV Lab, focusing on cutting-edge research and innovative applications centered around computer vision technology. We are dedicated to building core technological capabilities in the field of spatiotemporal internet. The Alibaba AMAP CV Lab is always at the forefront of innovation in computer vision research and applications, making it a key practitioner of technology in the field of Alibaba’s spatial intelligent internet. The Alibaba AMAP CV team is located at the intersection of the physical and digital worlds, empowering smart mobility and daily life with AI. As a core technical driver within Amap, our team pioneers:

Next-Generation 3D Map Engines
Multimodal Understanding & Generation
Spatial Intelligence
World Modeling

We welcome contributions, issues, and feedback!
Feel free to ⭐ the repos below to stay updated.

🔈 Latest News

🏛 Jul 05, 2025 – Our paper FantasyTalking is accepted by ACM MM 2025.
🏛 Jun 26, 2025 – Our paper SeqGrowGraph is accepted by ICCV 2025.
📢 May 23, 2025 – We released the full project of FSDrive.
🏛 Apr 29, 2025 – Our paper G3PT is accepted by IJCAI 2025.
📢 Apr 28, 2025 – We released the inference code and model weights of FantasyTalking.
📢 Apr 24, 2025 – We released the inference code and model weights of FantasyID.

🔧 Public Technologies

🗺️ 3D Map Engine

Next-generation engine for real-time rendering and updating of large-scale 3D maps with high-level accuracy.

📑 SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions

A generative framework that reframes lane network learning as a process of incrementally building an adjacency matrix.

📑 Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Benchmark and multi-modal approach for integrating lane-level traffic sign regulations into vectorized HD maps.

📑 Global-Guided Focal Neural Radiance Field for Large-Scale Scene Representation

📐 Spatial Intelligence

Framework for spatial reasoning and path planning in autonomous navigation and robotics.

📑 FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

The first VLA for autonomous driving visual reasoning, which proposes spatio-temporal CoT to think visually about trajectory planning and unifies visual generation and understanding with minimal data.

🌈 Multimodal AI

Toolkit for unified understanding and generation across text, image, video, audio and spatial data.

📑 G3PT: Unleash the Power of Autoregressive Modeling in 3D Generative Tasks

The first native 3D generation foundational model based on next-scale autoregression.

📑 A Study on the Adverse Impact of Synthetic Speech on Speech Recognition

Performance analysis and novel solution exploration for speech recognition under synthetic speech interference.

🤖 Human AIGC

The human related AIGC model family, more are coming soon. Please check out our Fantasy AIGC Family for more details.

🤡 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

A novel expression-driven video-generation method that pairs emotion-enhanced learning with masked cross-attention, enabling the creation of high-quality, richly expressive animations for both single and multi-portrait scenarios.

🗣️ FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

The first Wan based high-fidelity audio-driven avatar system that synchronizes facial expressions, lip motion, and body gestures in dynamic scenes.

🆔 FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation

A tuning-free text-to-video model that leverages 3D facial priors, multi-view augmentation, and layer-aware guidance injection to deliver dynamic, identity-preserving video generation.

📑 HumanRig: Learning Automatic Rigging for Humanoid Characters in Animation

The first dataset for automatic rigging of 3D generated digital humans and a transformer-based end-to-end automatic rigging algorithm.

🌐 World Modeling

Platform for constructing and querying dynamic digital twins of real-world environments.
🔜 Coming soon

💡 Others

📑 DPOSE: Online Keypoint-CAM Guided Inference for Driver Pose Estimation

An optimization scheme for a proprietary HPE task in DMS scenarios which involves a pose-wise hard mining strategy for distribution balance and an online keypoint-aligned Grad-CAM loss to constrain activations to semantic regions.

🤖 Doubly-Fused ViT: Fuse Information from Dual Vision Transformer Streams

📑 SCMT: Self-Correction Mean Teacher for Semi-supervised Object Detection

A self-correction mean teacher architecture that mitigates the impact of noisy pseudo-labels, offering a novel technological breakthrough in the field of semi-supervised object detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alibaba AMAP CV Lab

Alibaba AMAP CV Lab

👋 About Us

🔈 Latest News

🔧 Public Technologies

🗺️ 3D Map Engine

📑 SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions

📑 Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

📑 Global-Guided Focal Neural Radiance Field for Large-Scale Scene Representation

📐 Spatial Intelligence

📑 FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

🌈 Multimodal AI

📑 G3PT: Unleash the Power of Autoregressive Modeling in 3D Generative Tasks

📑 A Study on the Adverse Impact of Synthetic Speech on Speech Recognition

🤖 Human AIGC

🤡 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

🗣️ FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

🆔 FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation

📑 HumanRig: Learning Automatic Rigging for Humanoid Characters in Animation

🌐 World Modeling

💡 Others

📑 DPOSE: Online Keypoint-CAM Guided Inference for Driver Pose Estimation

🤖 Doubly-Fused ViT: Fuse Information from Dual Vision Transformer Streams

📑 SCMT: Self-Correction Mean Teacher for Semi-supervised Object Detection

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!