Highlights
- Pro
LLM
A guidance language for controlling large language models.
Code for 3D-LLM: Injecting the 3D World into Large Language Models
LAVIS - A One-stop Library for Language-Vision Intelligence
[CoRL 2023] This repository contains data generation and training code for Scaling Up & Distilling Down
A natural language interface for computers
[ICCV 2023] PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
Align 3D Point Cloud with Multi-modalities for Large Language Models
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Code for the RA-L paper "Language Models as Zero-Shot Trajectory Generators" available at https://arxiv.org/abs/2310.11604.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Paper list in the survey paper: Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Official Implementation of the ICCV 2023 paper: Perpetual Humanoid Control for Real-time Simulated Avatars
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
⏩ Ship faster with Continuous AI. Open-source CLI that can be used in TUI mode as a coding agent or Headless mode to run background agents
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"