Lists (3)
Sort Name ascending (A-Z)
Starred repositories
[ACL 24] The official implementation of MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation.
This is the repository for the Tool Learning survey.
A curated list of awesome LLM agents frameworks.
An Autonomous LLM Agent for Complex Task Solving
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Official Implementation of "Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation"
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
Code of the paper "Correctable Landmark Discovery via Large Models for Vision-Language Navigation" (TPAMI 2024)
Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"
This repository contains a Vulkan Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and hav…
A curated list of Android Security materials and resources For Pentesters and Bug Hunters
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
Awesome RL Reasoning Recipes ("Triple R")
Visualizing the attention of vision-language models
Official implementation of DepthLM
A modular graph-based Retrieval-Augmented Generation (RAG) system
[TMLR 2024] repository for VLN with foundation models
[CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method (CVPR-25)
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
MiMo-Audio: Audio Language Models are Few-Shot Learners
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)