mmangkad-dev
Popular repositories Loading
-
-
flash-attention-prebuild-wheels
flash-attention-prebuild-wheels PublicForked from mjun0812/flash-attention-prebuild-wheels
Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
Python
-
flashinfer
flashinfer PublicForked from flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Python
-
python-build-standalone
python-build-standalone PublicForked from astral-sh/python-build-standalone
Produce redistributable builds of Python
Python
-
-
Model-Optimizer
Model-Optimizer PublicForked from NVIDIA/Model-Optimizer
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Python
Repositories
- python-build-standalone Public Forked from astral-sh/python-build-standalone
Produce redistributable builds of Python
mmangkad-dev/python-build-standalone’s past year of commit activity - tokenspeed Public Forked from lightseekorg/tokenspeed
TokenSpeed is a speed-of-light LLM inference engine.
mmangkad-dev/tokenspeed’s past year of commit activity - lm-evaluation-harness Public Forked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
mmangkad-dev/lm-evaluation-harness’s past year of commit activity - flash-attention-prebuild-wheels Public Forked from mjun0812/flash-attention-prebuild-wheels
Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
mmangkad-dev/flash-attention-prebuild-wheels’s past year of commit activity - flash-attention Public Forked from vllm-project/flash-attention
Fast and memory-efficient exact attention
mmangkad-dev/flash-attention’s past year of commit activity - Model-Optimizer Public Forked from NVIDIA/Model-Optimizer
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
mmangkad-dev/Model-Optimizer’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…