Software and hardware co-optimization for sparse neural network workloads using the generalized sparse matrix-matrix multiplication hardware accelerator OuterSPACE

C++ 7 Updated May 18, 2021

SamsungLabs / Butterfly_Acc

Verilog 12 3 Updated Jun 25, 2025

thu-nics / DiTFastAttn

Jupyter Notebook 185 10 Updated Jan 14, 2025

Purdue-SoCET / atalla

Open-source AI Accelerator Stack integrating compute, memory, and software — from RTL to PyTorch.

SystemVerilog 18 3 Updated Nov 12, 2025

sudarshansdr / Transformer-Scaled-Dot-Product-Attention-Module-Hardware-Design-

Verilog 2 Updated Nov 30, 2024

GATECH-EIC / ViTCoD

[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

Python 123 13 Updated Jun 27, 2023

ZionK1 / DynamiQK

Dynamic pattern-driven optimizations for QxK

Scala 3 Updated Jul 23, 2025

casys-kaist / oaken

Artifact for Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization

Python 7 2 Updated May 9, 2025

Learning-Chips-Lab / OpenEye

The Open Source Hardware Accelerator for Efficient Neural Network Inference

Python 48 8 Updated Nov 9, 2025

mohit-0310 / Cycle-Accurate-Simulator-for-Network-on-Chip

Simulate a NoC router and a 3x3 NoC mesh containing nine routers.

Python 4 Updated Dec 13, 2023

tissue3 / EyerissSimulator

Eyeriss chip simulator

Python 38 8 Updated Mar 6, 2020

CLab-HKUST-GZ / micro58-axcore

Python 20 4 Updated Oct 21, 2025

scale-snu / LLMSimulator

C++ 23 2 Updated Oct 14, 2025

pku-liang / hlcd-spmm-project

Course Project for High Level Chip Design （高层次芯片设计）

C++ 17 6 Updated Jan 2, 2025

PSAL-POSTECH / PyTorchSim

PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework

Python 42 3 Updated Nov 13, 2025

PSAL-POSTECH / ONNXim

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference

C++ 166 28 Updated Feb 10, 2025

georgia-tech-synergy-lab / SIGMA

RTL implementation of Flex-DPE.

Verilog 115 32 Updated Feb 22, 2020

google-coral / coralnpu

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,781 186 Updated Nov 13, 2025

Zhu-Zixuan / Bitlet-PE

A bit-level sparsity-awared multiply-accumulate process element.

Verilog 18 1 Updated Jul 9, 2024

arkhadem / aim_simulator

Forked from CMU-SAFARI/ramulator2

A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0

C++ 43 8 Updated Jul 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jintao Xu Taoger-Xu

Block or report Taoger-Xu

Lists (1)

simulator

Stars

karthisugumar / CSE240D-Hierarchical_Mesh_NoC-Eyeriss_v2

mikeroyal / CoWoS-Guide

xlite-dev / LeetCUDA

stanford-cs336 / assignment2-systems

Starmys / TritonStudyGroup

attention-survey / Efficient_Attention_Survey

adamgallas / MIT_Bluespec_RISCV_Tutorial

anneouyang / OuterSPACE