Stars
This repository contains integer operators on GPUs for PyTorch.
Dynamically Reconfigurable Architecture Template and Cycle-level Microarchitecture Simulator for Dataflow AcCelerators
dMazeRunner: Dataflow acceleration optimization infrastructure for coarse-grained programmable accelerators
Topics in Machine Learning Accelerator Design
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
PyTorch implementation of DiracDeltaNet from paper Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
uL2Q: An Ultra-Low Loss Quantization Method for DNN Compression
The official implementation of the DAC 2024 paper GQA-LUT
Exploring through 7 popular datasets for visual object tracking, including OTB, UAV, VOT, LaSOT, NFS, TrackingNet and GOT-10k.
Simultaneous object detection and tracking using center points.
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
Awesome machine learning model compression research papers, quantization, tools, and learning material.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…
A curated list of awesome knowledge distillation papers and codes for object detection.
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
🔥🔥🔥A inference framework that support multi models of yolo5(torch and tensorrt), yolox(torch and tensorrt), nanodet(tensorrt), yolo-fastestV2(tensorrt) and yolov5-lite(tensorrt).
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.