answerleo

answerleo

1 follower · 4 following

Stars

alibaba / havenask

Havenask is a large-scale distributed information search system widely used within Alibaba Group

C++ 1,796 336 Updated Nov 3, 2025

vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

C++ 11,963 1,282 Updated Oct 22, 2025

GoogleCloudPlatform / PerfKitBenchmarker

PerfKit Benchmarker (PKB) contains a set of benchmarks to measure and compare cloud offerings. The benchmarks use default settings to reflect what most users will see. PerfKit Benchmarker is licens…

Python 1,992 543 Updated Jan 16, 2026

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 11,478 1,095 Updated Jan 15, 2026

meta-recsys / generative-recommenders

Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 1,681 338 Updated Jan 16, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 4,687 653 Updated Jan 16, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,503 4,080 Updated Jan 17, 2026

guidance-ai / guidance

A guidance language for controlling large language models.

Jupyter Notebook 21,184 1,138 Updated Jan 6, 2026

CoatiSoftware / Sourcetrail

Sourcetrail - free and open-source interactive source explorer

C++ 16,380 1,635 Updated Dec 13, 2021

exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 40,113 2,705 Updated Jan 17, 2026

NVlabs / Minitron

A family of compressed models obtained via pruning and knowledge distillation

363 18 Updated Nov 6, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,172 74 Updated Sep 30, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,592 509 Updated Jan 17, 2026

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,237 1,289 Updated May 23, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,383 928 Updated Mar 27, 2024

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,651 2,017 Updated Jan 17, 2026