-
University of Science and Technology of China
- Hefei
Stars
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
ProactiveBench: A Comprehensive Benchmark for VideoLLM Proactive Interaction Evaluation
Tongyi Deep Research, the Leading Open-source Deep Research Agent
GenAI Agent Framework, the Pydantic way
Open-source framework for conversational voice AI agents
Reference implementation for function calling with Deepgram's Voice Agent API
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
An open-source AI agent that brings the power of Gemini directly into your terminal.
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
A Survey of Reinforcement Learning for Large Reasoning Models
A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
Democratizing Reinforcement Learning for LLMs
List of language agents based on paper "Cognitive Architectures for Language Agents"
A python script for downloading huggingface datasets and models.
The docker-compose files for setting up a SearXNG instance with docker.
SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving," held at ECCV 2024.
CLIP+MLP Aesthetic Score Predictor
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.