-
KAIST
- Daejeon, Korea
- choidaedae.github.io
- https://scholar.google.com/citations?user=_EHGDJoAAAAJ&hl=ko
Stars
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method (CVPR-25)
Official implementation of paper Neural Green’s Functions (NeurIPS 2025)
Implementation for VPBench proposed in paper Visually Prompted Benchmarks Are Surprisingly Fragile
[RA-L 2025] FrontierNet: Learning Visual Cues to Explore
The official repository of BEAR: Benchmarking and Enhancing Multimodal Language Models with Atomic Embodied Capabilities
Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"
Vision-and-Language Navigation in Continuous Environments using Habitat
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
A simulation platform for versatile Embodied AI research and developments.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Thinking in 360°: Humanoid Visual Search in the Wild
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Training Visual Reasoners with Multimodal Verifiers
The Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.
[NeurIPS 2025] EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios.
[ECCV 2024 Oral 🔥] Arc2Face: A Foundation Model for ID-Consistent Human Faces ------------------------ [ICCVW 2025] ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Official implementation of "Repurposing Video Diffusion Transformers for Robust Point Tracking"
Code for "EgoX: Egocentric Video Generation from a Single Exocentric Video"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unofficial reimplementation of VLA-0 using TRL's SFTTrainer.
Native and Compact Structured Latents for 3D Generation