Awesome-local-LLM

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Inference platforms

LM Studio - discover, download and run local LLMs
lemonade - a local LLM server with GPU and NPU Acceleration

Inference engines

ollama - get up and running with LLMs
llama.cpp - LLM inference in C/C++
ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
koboldcpp - run GGUF models easily with a KoboldAI UI
vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
Nano-vLLM - a lightweight vLLM implementation built from scratch
vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
exo - run your own AI cluster at home with everyday devices
sglang - a fast serving framework for large language models and vision language models

User Interfaces

Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Page Assist - Use your locally running AI models to assist you in your web browsing

Large Language Models

Explorers, Benchmarks, Leaderboards

AI Models & API Providers Analysis - understand the AI landscape to choose the best model and provider for your use case
LLM Explorer - explore list of the open-source LLM models
Dubesor LLM Benchmark table - small-scale manual performance comparison benchmark
oobabooga benchmark - a list sorted by size (on disk) for each score

Model providers

Qwen - powered by Alibaba Cloud
Mistral AI - a pioneering French artificial intelligence startup
Tencent - a profile of a Chinese multinational technology conglomerate and holding company
Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
bartowski - providing GGUF versions of popular LLMs
Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets

Specific models

Qwen3 - a collection of the latest generation Qwen LLMs
Qwen3-Coder - a collection of the Qwen's most agentic code models to date
Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
OpenAI-o3-open - a very first OpenAI's open weight LLM model
Mistral-Small-3.2-24B-Instruct-2506 - a versatile model designed to handle a wide range of generative AI tasks, including instruction following, conversational assistance, image understanding, and function calling
Magistral-Small-2507 - a Mistral Small 3.1 (2503) with added reasoning capabilities
Devstral-Small-2507 - an agentic LLM for software engineering tasks fine-tuned from Mistral-Small-3.1
Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
Mellum-4b-base - an LLM optimized for code-related tasks
OlympicCoder-32B - a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics
NextCoder - a family of code-editing LLMs developed using the Qwen2.5-Coder Instruct variants as base
GLM-4.5 - a collection of hybrid reasoning models designed for intelligent agents
Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
Phi-4-mini-instruct - a lightweight open model built upon synthetic data and filtered publicly available websites
Granite-3.3-2B-Instruct - an LLM fine-tuned for improved reasoning and instruction-following capabilities
Qwen-Image - an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing
chatterbox - first production-grade open-source TTS model
Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
HunyuanWorld-1 - an open-source 3D world generation model
Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences

Tools

Coding Agents

OpenHands - a platform for software development agents powered by AI
cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
aider - AI pair programming in your terminal
tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
Roo-Code - a whole dev team of AI agents in your code editor
goose - an open-source, extensible AI agent that goes beyond code suggestions
opencode - a AI coding agent built for the terminal
kilocode - open source AI coding assistant for planning, building, and fixing code

Agent Frameworks

AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
langchain - build context-aware reasoning applications
langflow - a powerful tool for building and deploying AI-powered agents and workflows
autogen - a programming framework for agentic AI
llama_index - the leading framework for building LLM-powered agents over your data
crewAI - a framework for orchestrating role-playing, autonomous AI agents
agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
camel - the first and the best multi-agent framework
openai-agents-python - a lightweight, powerful framework for multi-agent workflows
ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
ragbits - building blocks for rapid development of GenAI applications

Retrieval-Augmented Generation

graphrag - a modular graph-based RAG system
LightRAG - simple and fast RAG
graphiti - build real-time knowledge graphs for AI Agents
vanna - an open-source Python RAG framework for SQL generation and related functionality

Computer Use

open-interpreter - a natural language interface for computers
OmniParser - a simple screen parsing tool towards pure vision based GUI agent
self-operating-computer - a framework to enable multimodal models to operate a computer
cua - the Docker Container for Computer-Use AI Agents
Agent-S - an open agentic framework that uses computers like a human

Browser Automation

puppeteer - a JavaScript API for Chrome and Firefox
playwright - a framework for Web Testing and Automation
Playwright MCP server - an MCP server that provides browser automation capabilities using Playwright
browser-use - make websites accessible for AI agents
firecrawl - turn entire websites into LLM-ready markdown or structured data
stagehand - the AI Browser Automation Framework

Memory Management

mem0 - universal memory layer for AI Agents
letta - the stateful agents framework with memory, reasoning, and context management
cognee - memory for AI Agents in 5 lines of code

Testing, Evaluation, and Observability

langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
giskard - an open-source evaluation & testing for AI & LLM systems
agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place

Research

Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
local-deep-researcher - fully local web research and report writing assistant
SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
local-deep-research - an AI-powered research assistant for deep, iterative research
maestro - an AI-powered research application designed to streamline complex research tasks
open-notebook - an open-source implementation of Notebook LM with more flexibility and features

Training and Fine-tuning

Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
augmentoolkit - train an open-source LLM on new facts

Miscellaneous

presenton - an open-source AI presentation generator and API
OmniGen2 - exploration to advanced multimodal generation
4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure

Hardware

Digital Spaceport - reviews of various builds designed for LLM inference
JetsonHacks - information about developing on NVIDIA Jetson Development Kits
Miyconst - tests of various types of hardware capable of running LLMs
Alex Ziskind - tests of pcs, laptops, gpus etc. capable of running LLMs
LLM Inference VRAM & GPU Requirement Calculator - calculate how many GPUs you need to deploy LLMs

Tutorials

Models

Let's reproduce GPT-2 (124M)

Prompt Engineering

Prompt Engineering by NirDiamant - a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies
Prompting guide 101 - a quick-start handbook for effective prompts by Google
Prompt Engineering by Google - prompt engineering by Google
Prompt Engineering by Anthropic - prompt engineering by Anthropic
Prompt Engineering Interactive Tutorial - Prompt Engineering Interactive Tutorial by Anthropic
Real world prompting - real world prompting tutorial by Anthropic
Prompt evaluations - prompt evaluations course by Anthropic
system-prompts-and-models-of-ai-tools - a collection of system prompts extracted from AI tools
Context7 MCP Server - up-to-date code documentation for LLMs and AI code editors
Prompt from Codex - Prompt used to steer behavior of OpenAI's Codex

Context Engineering

Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems

Agents

GenAI Agents - tutorials and implementations for various Generative AI Agent techniques
Agents towards production - end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for real-world launche
601 real-world gen AI use cases - 601 real-world gen AI use cases from the world's leading organizations by Google
A practical guide to building agents - a practical guide to building agents by OpenAI

Retrieval-Augmented Generation

RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python

Miscellaneous

Self-hosted AI coding that just works

Communities

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-local-LLM

Inference platforms

Inference engines

User Interfaces

Large Language Models

Explorers, Benchmarks, Leaderboards

Model providers

Specific models

Tools

Coding Agents

Agent Frameworks

Retrieval-Augmented Generation

Computer Use

Browser Automation

Memory Management

Testing, Evaluation, and Observability

Research

Training and Fine-tuning

Miscellaneous

Hardware

Tutorials

Models

Prompt Engineering

Context Engineering

Agents

Retrieval-Augmented Generation

Miscellaneous

Communities

Contributing

About

Uh oh!

Releases

Packages

License

silentx3-coder/Awesome-local-LLM

Folders and files

Latest commit

History

Repository files navigation

Awesome-local-LLM

Inference platforms

Inference engines

User Interfaces

Large Language Models

Explorers, Benchmarks, Leaderboards

Model providers

Specific models

Tools

Coding Agents

Agent Frameworks

Retrieval-Augmented Generation

Computer Use

Browser Automation

Memory Management

Testing, Evaluation, and Observability

Research

Training and Fine-tuning

Miscellaneous

Hardware

Tutorials

Models

Prompt Engineering

Context Engineering

Agents

Retrieval-Augmented Generation

Miscellaneous

Communities

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages