Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025
GlazeWM is a tiling window manager for Windows inspired by i3wm.
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
pku-liang / Sanger
Forked from hatsu3/SangerA co-design architecture on sparse attention
Fast and memory-efficient exact attention
SiT Dataset: Socially Interactive Pedestrian Trajectory Dataset for Social Navigation Robots [NeurIPS 2023]
Verilog AXI components for FPGA implementation
Fast and accurate DRAM power and energy estimation tool
A Fast and Extensible DRAM Simulator, with built-in support for modeling many different DRAM technologies including DDRx, LPDDRx, GDDRx, WIOx, HBMx, and various academic proposals. Described in the…
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
A FPGA supported RISC-V CPU with 5-stage pipeline implemented in Verilog HDL
Inference code for AI Challenge (Dec 2020)
TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference
Layer-wise Pruning of Transformer Heads for Efficient Language Modeling
AlexeyAB / darknet
Forked from pjreddie/darknetYOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite