Lists (1)
Sort Name ascending (A-Z)
Stars
A SystemVerilog implementation of Row-Stationary dataflow and Hierarchical Mesh Network-on-Chip Architecture based on Eyeriss CNN Accelerator
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
Software and hardware co-optimization for sparse neural network workloads using the generalized sparse matrix-matrix multiplication hardware accelerator OuterSPACE
Open-source AI Accelerator Stack integrating compute, memory, and software — from RTL to PyTorch.
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Artifact for Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
The Open Source Hardware Accelerator for Efficient Neural Network Inference
Simulate a NoC router and a 3x3 NoC mesh containing nine routers.
Course Project for High Level Chip Design (高层次芯片设计)
PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
RTL implementation of Flex-DPE.
A machine learning accelerator core designed for energy-efficient AI at the edge.
A bit-level sparsity-awared multiply-accumulate process element.
arkhadem / aim_simulator
Forked from CMU-SAFARI/ramulator2A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0
Artifact material for [HPCA 2025] #2108 "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"