Thanks to visit codestin.com
Credit goes to github.com

Skip to content
@amap-cvlab

Alibaba AMAP CV Lab

Leading computer vision innovation, powering smart mobility and AI-driven spatiotemporal technologies.

Alibaba AMAP CV Lab

中文阅读

👋 About Us

We are the Alibaba AMAP CV Lab, focusing on cutting-edge research and innovative applications centered around computer vision technology. We are dedicated to building core technological capabilities in the field of spatiotemporal internet. The Alibaba AMAP CV Lab is always at the forefront of innovation in computer vision research and applications, making it a key practitioner of technology in the field of Alibaba’s spatial intelligent internet. The Alibaba AMAP CV team is located at the intersection of the physical and digital worlds, empowering smart mobility and daily life with AI. As a core technical driver within Amap, our team pioneers:

  • Next-Generation 3D Map Engines
  • Multimodal Understanding & Generation
  • Spatial Intelligence
  • World Modeling

We welcome contributions, issues, and feedback!
Feel free to ⭐ the repos below to stay updated.

🔈 Latest News

  • 🏛 Jul 05, 2025 – Our paper FantasyTalking is accepted by ACM MM 2025.
  • 🏛 Jun 26, 2025 – Our paper SeqGrowGraph is accepted by ICCV 2025.
  • 📢 May 23, 2025 – We released the full project of FSDrive.
  • 🏛 Apr 29, 2025 – Our paper G3PT is accepted by IJCAI 2025.
  • 📢 Apr 28, 2025 – We released the inference code and model weights of FantasyTalking.
  • 📢 Apr 24, 2025 – We released the inference code and model weights of FantasyID.

🔧 Public Technologies

🗺️ 3D Map Engine

Next-generation engine for real-time rendering and updating of large-scale 3D maps with high-level accuracy.

📑 SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions

arXiv Publish

A generative framework that reframes lane network learning as a process of incrementally building an adjacency matrix.

📑 Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Home Page arXiv Publish

Benchmark and multi-modal approach for integrating lane-level traffic sign regulations into vectorized HD maps.

📑 Global-Guided Focal Neural Radiance Field for Large-Scale Scene Representation

Home Page arXiv Publish

📐 Spatial Intelligence

Framework for spatial reasoning and path planning in autonomous navigation and robotics.

📑 FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

The first VLA for autonomous driving visual reasoning, which proposes spatio-temporal CoT to think visually about trajectory planning and unifies visual generation and understanding with minimal data.

Home Page arXiv Code

🌈 Multimodal AI

Toolkit for unified understanding and generation across text, image, video, audio and spatial data.

📑 G3PT: Unleash the Power of Autoregressive Modeling in 3D Generative Tasks

arXiv Publish

The first native 3D generation foundational model based on next-scale autoregression.

📑 A Study on the Adverse Impact of Synthetic Speech on Speech Recognition

Publish

Performance analysis and novel solution exploration for speech recognition under synthetic speech interference.

🤖 Human AIGC

The human related AIGC model family, more are coming soon. Please check out our Fantasy AIGC Family for more details.

🤡 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

Home Page arXiv Code

A novel expression-driven video-generation method that pairs emotion-enhanced learning with masked cross-attention, enabling the creation of high-quality, richly expressive animations for both single and multi-portrait scenarios.

🗣️ FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Home Page arXiv Publish HuggingFace HuggingFace ModelScope Code

The first Wan based high-fidelity audio-driven avatar system that synchronizes facial expressions, lip motion, and body gestures in dynamic scenes.

🆔 FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation

Home Page arXiv HuggingFace ModelScope Code

A tuning-free text-to-video model that leverages 3D facial priors, multi-view augmentation, and layer-aware guidance injection to deliver dynamic, identity-preserving video generation.

📑 HumanRig: Learning Automatic Rigging for Humanoid Characters in Animation

Home Page arXiv Publish Code HuggingFace

The first dataset for automatic rigging of 3D generated digital humans and a transformer-based end-to-end automatic rigging algorithm.

🌐 World Modeling

Platform for constructing and querying dynamic digital twins of real-world environments.
🔜 Coming soon

💡 Others

📑 DPOSE: Online Keypoint-CAM Guided Inference for Driver Pose Estimation

Publish
An optimization scheme for a proprietary HPE task in DMS scenarios which involves a pose-wise hard mining strategy for distribution balance and an online keypoint-aligned Grad-CAM loss to constrain activations to semantic regions.

🤖 Doubly-Fused ViT: Fuse Information from Dual Vision Transformer Streams

Publish Code

📑 SCMT: Self-Correction Mean Teacher for Semi-supervised Object Detection

Publish

A self-correction mean teacher architecture that mitigates the impact of noisy pseudo-labels, offering a novel technological breakthrough in the field of semi-supervised object detection.

Popular repositories Loading

  1. MV-Painter MV-Painter Public

    Python 307 60

  2. .github .github Public

    1

  3. CE-Nav CE-Nav Public

    official implementation of [CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation]

    Python 1

Repositories

Showing 3 of 3 repositories

Top languages

Loading…

Most used topics

Loading…