Stars
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …
An transformer based LLM. Written completely in Rust
Minimalistic 4D-parallelism distributed training framework for education purpose
The financial transactions database designed for mission critical safety and performance.
Distributed query engine providing simple and reliable data processing for any modality and scale
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Java raft/config/mq/rpc engine, zero dependencies, 10X faster
Patterns and resources of low latency programming.
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
Cost-efficient and pluggable Infrastructure components for GenAI inference
Open-source vector similarity search for Postgres
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
DuckLake is an integrated data lake and catalog format
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Large Language Model (LLM) Systems Paper List
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Accelerate inference without tears
Upserts, Deletes And Incremental Processing on Big Data.