Thanks to visit codestin.com
Credit goes to github.com

shixianc

Follow

shixianc shixianc

Follow

Area of focus: LLM Inference Optimization | Multimodal Training

1 follower · 0 following

in/shixian-cui-2814ab187

Achievements

Achievements

shixianc/README.md

i'm shixian

interested in getting model running faster

Pinned Loading

vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62.3k 11.1k
NVIDIA/TensorRT-LLM NVIDIA/TensorRT-LLM Public

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12.1k 1.8k
deepseek-ai/DeepGEMM deepseek-ai/DeepGEMM Public

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5.9k 737
flash-attention-minimal flash-attention-minimal Public

Forked from tspeterkim/flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda
QwenLM/Qwen2-Audio QwenLM/Qwen2-Audio Public

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1.9k 146