Lists (12)
Sort Name ascending (A-Z)
Starred repositories
BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFEx
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
[NeurIPS 2019, Spotlight] Point-Voxel CNN for Efficient 3D Deep Learning
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"
[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacch…
Submanifold sparse convolutional networks
[NeurIPS 2024 D&B] Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
[IROS 2025] Generalizable Humanoid Manipulation with 3D Diffusion Policies. Part 1: Train & Deploy of iDP3
[RSS 2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Official implementation of the paper "EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation".
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
A collection of vision-language-action model post-training methods.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Book_2_《可视之美》 | 鸢尾花书:从加减乘除到机器学习,欢迎批评指正
LeRobot sim2real code. Train in fast simulation and deploy visual policies zero shot to the real world