Thanks to visit codestin.com
Credit goes to Github.com

zhxfl

Follow

zhxfl zhxfl

Follow

154 followers · 6 following

Achievements

Achievements

Organizations

Stars

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,367 1,347 Updated Jul 9, 2025

infinigence / FlashOverlap

A lightweight design for computation-communication overlap.

Cuda 207 9 Updated Dec 25, 2025

NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter

Python 4,840 696 Updated Aug 17, 2024

wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,619 1,864 Updated Dec 20, 2025

onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 958 387 Updated Dec 10, 2025

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 34,475 3,333 Updated Dec 31, 2025

NervanaSystems / maxas

Assembler for NVIDIA Maxwell architecture

Sass 1,058 172 Updated Jan 3, 2023

herumi / xbyak

A JIT assembler for x86/x64 architectures supporting FPU, MMX, SSE (1-4), AVX (1-2, 512), APX, and AVX10.2

C++ 2,211 301 Updated Dec 29, 2025

PaddlePaddle / CINN

Compiler Infrastructure for Neural Networks

C++ 147 114 Updated Jul 18, 2023

zerollzeng / tiny-tensorrt

Deploy your model with TensorRT quickly.

C++ 765 100 Updated Nov 21, 2023

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 18,836 3,624 Updated Dec 31, 2025

mindspore-ai / mindspore

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ 4,655 748 Updated Jul 29, 2024

DavidDiazGuerra / gpuRIR

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

Cuda 572 93 Updated Jul 18, 2025

MegEngine / MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,806 550 Updated Oct 24, 2024

halide / Halide

a language for fast, portable data-parallel computation

C++ 6,494 1,096 Updated Dec 28, 2025

daadaada / turingas

Assembler for NVIDIA Volta and Turing GPUs

Python 236 40 Updated Jan 13, 2022

jeng1220 / cuGemmProf

A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

C++ 25 7 Updated Feb 9, 2021

NVIDIA / thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,999 761 Updated Feb 8, 2024

eddieantonio / imgcat

It's like cat, but for images.

C 913 34 Updated Oct 21, 2025

mozilla / TTS

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Jupyter Notebook 10,089 1,327 Updated Nov 9, 2023

alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 13,829 2,152 Updated Dec 30, 2025

dmlc / nnvm-fusion

Kernel Fusion and Runtime Compilation Based on NNVM

C++ 72 26 Updated Nov 21, 2016

NVIDIA / cnmem

A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory

C++ 299 76 Updated Nov 28, 2018

mapillary / inplace_abn

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Python 1,334 186 Updated Jul 8, 2025

XiuYuLi / deepcore_source_code

Subpart source code of of deepcore v0.7

C 27 14 Updated Jun 28, 2020

NVIDIA / nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

Cuda 744 126 Updated Jan 19, 2021

keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Python 2,990 951 Updated Jul 6, 2023

flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

C++ 6,444 1,001 Updated Nov 7, 2025

ap-hynninen / cutt

CUDA Tensor Transpose (cuTT) library

C++ 53 28 Updated Aug 10, 2017

pjreddie / darknet

Convolutional Neural Networks

C 26,419 21,235 Updated May 3, 2024