Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Taoger-Xu's full-sized avatar

Block or report Taoger-Xu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A SystemVerilog implementation of Row-Stationary dataflow and Hierarchical Mesh Network-on-Chip Architecture based on Eyeriss CNN Accelerator

SystemVerilog 175 31 Updated Dec 14, 2019

Chip on Wafer on Substrate (CoWoS) Guide

C 43 3 Updated Feb 1, 2022

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,437 835 Updated Nov 6, 2025

Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch

Python 116 240 Updated Jul 25, 2025
Python 93 5 Updated Sep 22, 2025

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

222 5 Updated Aug 26, 2025

Software and hardware co-optimization for sparse neural network workloads using the generalized sparse matrix-matrix multiplication hardware accelerator OuterSPACE

C++ 7 Updated May 18, 2021
Verilog 12 3 Updated Jun 25, 2025
Jupyter Notebook 185 10 Updated Jan 14, 2025

Open-source AI Accelerator Stack integrating compute, memory, and software — from RTL to PyTorch.

SystemVerilog 18 3 Updated Nov 12, 2025

[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

Python 123 13 Updated Jun 27, 2023

Dynamic pattern-driven optimizations for QxK

Scala 3 Updated Jul 23, 2025

Artifact for Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization

Python 7 2 Updated May 9, 2025

The Open Source Hardware Accelerator for Efficient Neural Network Inference

Python 48 8 Updated Nov 9, 2025

Simulate a NoC router and a 3x3 NoC mesh containing nine routers.

Python 4 Updated Dec 13, 2023

Eyeriss chip simulator

Python 38 8 Updated Mar 6, 2020
Python 20 4 Updated Oct 21, 2025
C++ 23 2 Updated Oct 14, 2025

Course Project for High Level Chip Design (高层次芯片设计)

C++ 17 6 Updated Jan 2, 2025

PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework

Python 42 3 Updated Nov 13, 2025

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference

C++ 166 28 Updated Feb 10, 2025

RTL implementation of Flex-DPE.

Verilog 115 32 Updated Feb 22, 2020

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,781 186 Updated Nov 13, 2025

A bit-level sparsity-awared multiply-accumulate process element.

Verilog 18 1 Updated Jul 9, 2024

A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0

C++ 43 8 Updated Jul 22, 2025

Processing-In-Memory (PIM) Simulator

C++ 199 65 Updated Dec 12, 2024

Nano vLLM

Python 8,730 1,056 Updated Nov 3, 2025

Artifact material for [HPCA 2025] #2108 "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"

Python 47 10 Updated Sep 1, 2025
Next