Stars
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" -- the first LLM-based web agent and benchmark for generalist web agents
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
🌎💪 BrowserGym, a Gym environment for web task automation
A Reproduction of GDM's Nested Learning Paper
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows — particularly Claude Code
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
Scaling Long-Horizon LLM Agent via Context-Folding
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Code for "ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch", where dataset is publicly available at https://huggingface.co/datasets/ji…
A unified toolkit for pruning in AI model compression research.
🐉 Loong: Synthesize Long CoTs at Scale through Verifiers.
Tools for working with the Abstraction & Reasoning Corpus