-
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedMay 12, 2025 -
optimum-habana Public
Forked from huggingface/optimum-habanaEasy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Python Apache License 2.0 UpdatedMay 9, 2025 -
diffusers Public
Forked from huggingface/diffusers🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Python Apache License 2.0 UpdatedOct 24, 2024 -
neural-compressor Public
Forked from onnx/neural-compressorPython Apache License 2.0 UpdatedJul 31, 2024 -
auto-round Public
Forked from intel/auto-roundSOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Python Apache License 2.0 UpdatedJun 8, 2024 -
optimum-intel Public
Forked from huggingface/optimum-intel🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Python Apache License 2.0 UpdatedSep 15, 2023 -
onnxruntime-inference-examples Public
Forked from microsoft/onnxruntime-inference-examplesExamples for using ONNX Runtime for machine learning inferencing.
Python MIT License UpdatedAug 13, 2023 -
onnxruntime Public
Forked from microsoft/onnxruntimeONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
C++ MIT License UpdatedJul 25, 2023 -
models Public
Forked from onnx/modelsA collection of pre-trained, state-of-the-art models in the ONNX format
Jupyter Notebook Apache License 2.0 UpdatedNov 23, 2022