sayybro

Sehyung Kim sayybro

MS/PhD Integrated Student at MLV Lab

5 followers · 13 following

Highlights

Stars

Infini-AI-Lab / MagicPIG

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 238 16 Updated Dec 16, 2024

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 758 65 Updated Oct 31, 2025

GeniusHTX / TALE

Python 133 6 Updated Sep 12, 2025

sileix / chain-of-draft

Code and data for the Chain-of-Draft (CoD) paper

Python 333 40 Updated Mar 11, 2025

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,004 52 Updated Oct 25, 2025

Eclipsess / Awesome-Efficient-Reasoning-LLMs

[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

675 33 Updated Oct 20, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 344 36 Updated Jul 10, 2025

mlvlab / NFBO

Python 14 1 Updated Apr 22, 2025

mlvlab / BLiM

Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)

Python 19 Updated Aug 1, 2025

mlvlab / Representation-Shift

Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025

23 2 Updated Jul 30, 2025

CR400AF-A / SparseMM

[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Python 76 2 Updated Oct 19, 2025

czg1225 / CoDe

[CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Python 107 4 Updated Sep 27, 2025

ML-GSAI / Scaling-Diffusion-Transformers-muP

Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".

Python 91 1 Updated Nov 2, 2025

JieShibo / MoLE

[ICML 2025 Oral] Mixture of Lookup Experts

Python 53 2 Updated May 14, 2025

dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Jupyter Notebook 206 15 Updated Feb 13, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,331 277 Updated Jul 17, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,146 63 Updated Sep 30, 2025

Infini-AI-Lab / MagicDec

[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Python 130 9 Updated Dec 4, 2024

AMD-AGI / Gumiho

Official Implementation of "Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding" (ICML'25)

Python 6 2 Updated Jul 11, 2025

BaohaoLiao / RSD

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

Python 50 6 Updated May 2, 2025

keyboardAnt / hf-bench

Benchmark TTFT, TPOT, T/s, Speedup

Python 7 Updated Jun 2, 2025

hemingkx / SWIFT

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Python 56 4 Updated Feb 21, 2025

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 37,789 4,096 Updated Nov 5, 2025

NUS-TRAIL / RAPID

Python 17 Updated Mar 2, 2025

HumanMLLM / LLaVA-Scissor

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 111 1 Updated Jul 1, 2025

mlvlab / ST-VLM

12 Updated Mar 28, 2025

mlvlab / DeepVideoR1

28 Updated Jun 13, 2025

ChocoWu / USG

This is the project for 'USG'.

CSS 29 Updated Apr 7, 2025

Zefan-Cai / R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Python 1,143 184 Updated Oct 16, 2025

KD-TAO / VidKV

VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Python 22 Updated Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly