Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents…

Python 160 11 Updated Dec 8, 2025

zongzi531 / modular-rag

This is RAG Modules Repo. This includes various modules in the RAG ecosystem.

Python 5 Updated Aug 28, 2024

ECNU-ICALK / ConvRAG

[SIGIR 2024] Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check

7 1 Updated May 16, 2024

amazon-science / RAGChecker

RAGChecker: A Fine-grained Framework For Diagnosing RAG

Python 1,046 85 Updated Dec 13, 2024

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,917 336 Updated Jan 18, 2026

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Python 11,165 830 Updated Dec 15, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,668 2,021 Updated Jan 19, 2026

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

915 135 Updated Jan 13, 2026

wangcunxiang / LLM-Factuality-Survey

The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>

340 29 Updated Apr 25, 2024

MetaGLM / FinGLM

FinGLM: 致力于构建一个开放的、公益的、持久的金融大模型项目，利用开源开放来促进「AI+金融」。

HTML 2,141 314 Updated May 8, 2024

canghongjian / vllm

Forked from vllm-project/vllm

ChatGLM2 support for vLLM

Python 5 1 Updated Aug 22, 2023

LlamaFamily / Llama-Chinese

Llama中文社区，实时汇总最新Llama学习资料，构建最好的中文Llama大模型开源生态，完全开源可商用

Python 14,752 1,306 Updated Apr 6, 2025

LinkSoul-AI / Chinese-Llama-2-7b

开源社区第一个能下载、能运行的中文 LLaMA2 模型！

Python 2,223 196 Updated Oct 26, 2023

chatchat-space / Langchain-Chatchat

Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

Python 37,120 6,131 Updated Nov 10, 2025

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,050 8,026 Updated Jan 17, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 67,796 12,666 Updated Jan 19, 2026

li-plus / chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

C++ 2,968 334 Updated Jul 31, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,534 648 Updated Jan 16, 2026

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 12,058 938 Updated Mar 11, 2025

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 23,307 3,058 Updated Aug 15, 2024

sinaptik-ai / pandas-ai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

Python 23,040 2,264 Updated Oct 28, 2025

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 5,022 530 Updated Apr 11, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,246 192 Updated Mar 27, 2024

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python 3,077 457 Updated Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kevinddddddd

Block or report Kevinddddddd

Stars

QwenLM / Qwen3-Embedding

openai / openai-agents-python

BannyLon / DifyAIA

svcvit / Awesome-Dify-Workflow

langgenius / dify

OpenBMB / VisRAG

NetEase-Media / grps_trtllm