A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally
- LM Studio - discover, download and run local LLMs
lemonade - a local LLM server with GPU and NPU Acceleration
ollama - get up and running with LLMs
llama.cpp - LLM inference in C/C++
ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
koboldcpp - run GGUF models easily with a KoboldAI UI
vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
Nano-vLLM - a lightweight vLLM implementation built from scratch
vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
exo - run your own AI cluster at home with everyday devices
sglang - a fast serving framework for large language models and vision language models
Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Page Assist - Use your locally running AI models to assist you in your web browsing
- AI Models & API Providers Analysis - understand the AI landscape to choose the best model and provider for your use case
- LLM Explorer - explore list of the open-source LLM models
- Dubesor LLM Benchmark table - small-scale manual performance comparison benchmark
- oobabooga benchmark - a list sorted by size (on disk) for each score
- Qwen - powered by Alibaba Cloud
Mistral AI - a pioneering French artificial intelligence startup
- Tencent - a profile of a Chinese multinational technology conglomerate and holding company
- Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
- bartowski - providing GGUF versions of popular LLMs
- Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
- Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets
- Qwen3 - a collection of the latest generation Qwen LLMs
- Qwen3-Coder - a collection of the Qwen's most agentic code models to date
Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
OpenAI-o3-open - a very first OpenAI's open weight LLM model
Mistral-Small-3.2-24B-Instruct-2506 - a versatile model designed to handle a wide range of generative AI tasks, including instruction following, conversational assistance, image understanding, and function calling
Magistral-Small-2507 - a Mistral Small 3.1 (2503) with added reasoning capabilities
Devstral-Small-2507 - an agentic LLM for software engineering tasks fine-tuned from Mistral-Small-3.1
Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
- Mellum-4b-base - an LLM optimized for code-related tasks
- OlympicCoder-32B - a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics
- NextCoder - a family of code-editing LLMs developed using the Qwen2.5-Coder Instruct variants as base
- GLM-4.5 - a collection of hybrid reasoning models designed for intelligent agents
- Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
- Phi-4-mini-instruct - a lightweight open model built upon synthetic data and filtered publicly available websites
- Granite-3.3-2B-Instruct - an LLM fine-tuned for improved reasoning and instruction-following capabilities
- Qwen-Image - an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing
- chatterbox - first production-grade open-source TTS model
- Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
- Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
- HunyuanWorld-1 - an open-source 3D world generation model
- Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
OpenHands - a platform for software development agents powered by AI
cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
aider - AI pair programming in your terminal
tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
Roo-Code - a whole dev team of AI agents in your code editor
goose - an open-source, extensible AI agent that goes beyond code suggestions
opencode - a AI coding agent built for the terminal
kilocode - open source AI coding assistant for planning, building, and fixing code
AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
langchain - build context-aware reasoning applications
langflow - a powerful tool for building and deploying AI-powered agents and workflows
autogen - a programming framework for agentic AI
llama_index - the leading framework for building LLM-powered agents over your data
crewAI - a framework for orchestrating role-playing, autonomous AI agents
agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
camel - the first and the best multi-agent framework
openai-agents-python - a lightweight, powerful framework for multi-agent workflows
ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
ragbits - building blocks for rapid development of GenAI applications
graphrag - a modular graph-based RAG system
LightRAG - simple and fast RAG
graphiti - build real-time knowledge graphs for AI Agents
vanna - an open-source Python RAG framework for SQL generation and related functionality
open-interpreter - a natural language interface for computers
OmniParser - a simple screen parsing tool towards pure vision based GUI agent
self-operating-computer - a framework to enable multimodal models to operate a computer
cua - the Docker Container for Computer-Use AI Agents
Agent-S - an open agentic framework that uses computers like a human
puppeteer - a JavaScript API for Chrome and Firefox
playwright - a framework for Web Testing and Automation
Playwright MCP server - an MCP server that provides browser automation capabilities using Playwright
browser-use - make websites accessible for AI agents
firecrawl - turn entire websites into LLM-ready markdown or structured data
stagehand - the AI Browser Automation Framework
mem0 - universal memory layer for AI Agents
letta - the stateful agents framework with memory, reasoning, and context management
cognee - memory for AI Agents in 5 lines of code
langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
giskard - an open-source evaluation & testing for AI & LLM systems
agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place
Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
local-deep-researcher - fully local web research and report writing assistant
SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
local-deep-research - an AI-powered research assistant for deep, iterative research
maestro - an AI-powered research application designed to streamline complex research tasks
open-notebook - an open-source implementation of Notebook LM with more flexibility and features
Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
augmentoolkit - train an open-source LLM on new facts
presenton - an open-source AI presentation generator and API
OmniGen2 - exploration to advanced multimodal generation
4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure
Digital Spaceport - reviews of various builds designed for LLM inference
JetsonHacks - information about developing on NVIDIA Jetson Development Kits
Miyconst - tests of various types of hardware capable of running LLMs
Alex Ziskind - tests of pcs, laptops, gpus etc. capable of running LLMs
- LLM Inference VRAM & GPU Requirement Calculator - calculate how many GPUs you need to deploy LLMs
Prompt Engineering by NirDiamant - a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies
Prompting guide 101 - a quick-start handbook for effective prompts by Google
Prompt Engineering by Google - prompt engineering by Google
Prompt Engineering by Anthropic - prompt engineering by Anthropic
Prompt Engineering Interactive Tutorial - Prompt Engineering Interactive Tutorial by Anthropic
Real world prompting - real world prompting tutorial by Anthropic
Prompt evaluations - prompt evaluations course by Anthropic
system-prompts-and-models-of-ai-tools - a collection of system prompts extracted from AI tools
Context7 MCP Server - up-to-date code documentation for LLMs and AI code editors
Prompt from Codex - Prompt used to steer behavior of OpenAI's Codex
Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems
GenAI Agents - tutorials and implementations for various Generative AI Agent techniques
Agents towards production - end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for real-world launche
601 real-world gen AI use cases - 601 real-world gen AI use cases from the world's leading organizations by Google
A practical guide to building agents - a practical guide to building agents by OpenAI
RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python
We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.