Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View BlackSamorez's full-sized avatar

Highlights

  • Pro

Block or report BlackSamorez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 58 10 Updated Oct 28, 2025

An iOS app that integrates a Large Language Model (LLM) to process audio recordings for transcription and summarization.

C++ 16 1 Updated Nov 29, 2024

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 124 9 Updated Oct 30, 2025
Jupyter Notebook 103 10 Updated Oct 30, 2025

First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

Python 44 2 Updated Sep 18, 2025

Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.

TypeScript 51,838 7,559 Updated Oct 30, 2025

Code for the EMNLP 2024 paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".

Python 8 Updated Jun 18, 2024
Python 152 15 Updated Jun 22, 2025

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 145 19 Updated Aug 21, 2025

Technical Note: From C++98 to C++2x

143 12 Updated Jun 8, 2025

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Python 118 5 Updated Mar 6, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 923 77 Updated Sep 4, 2024

QuIP quantization

Python 59 6 Updated Mar 17, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,301 188 Updated Aug 8, 2025
Go 4 Updated Feb 18, 2024

Friends don't let friends make certain types of data visualization - What are they and why are they bad.

R 6,901 278 Updated Sep 3, 2025
Python 561 49 Updated Oct 29, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,209 183 Updated Mar 27, 2024

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 277 23 Updated Nov 3, 2023

Meditron is a suite of open-source medical Large Language Models (LLMs).

Python 2,104 203 Updated Apr 10, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,293 77 Updated Mar 6, 2025

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 183 13 Updated Apr 16, 2024

💎A site, that contains systematic optimization methods and theory review

Jupyter Notebook 123 106 Updated Aug 29, 2025

distributed trainer for LLMs

Python 583 85 Updated May 20, 2024

Minimalist ML framework for Rust

Rust 18,435 1,280 Updated Oct 30, 2025

This repository is the official implementation of 'EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning' (ICML 2022).

Jupyter Notebook 14 1 Updated Aug 2, 2022

A nasty project for the 2014's Microsoft Research Summer School.

JavaScript 1 1 Updated Mar 7, 2019

Inference Llama 2 in one file of pure C

C 18,891 2,394 Updated Aug 6, 2024
Python 26 4 Updated Aug 25, 2023
Next