-
ISTA
- Vienna, Austria
- in/blacksamorez
- https://blog.panferov.org/
Highlights
- Pro
Stars
An iOS app that integrates a Large Language Model (LLM) to process audio recordings for transcription and summarization.
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Code for the EMNLP 2024 paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
Meditron is a suite of open-source medical Large Language Models (LLMs).
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
💎A site, that contains systematic optimization methods and theory review
This repository is the official implementation of 'EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning' (ICML 2022).
A nasty project for the 2014's Microsoft Research Summer School.