Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View zhxfl's full-sized avatar

Organizations

@PaddlePaddle

Block or report zhxfl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,392 1,344 Updated Jul 9, 2025

A lightweight design for computation-communication overlap.

Python 212 10 Updated Jan 20, 2026

An easy to use PyTorch to TensorRT converter

Python 4,845 697 Updated Aug 17, 2024

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,658 1,862 Updated Jan 20, 2026

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 966 393 Updated Jan 21, 2026

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 34,660 3,366 Updated Jan 22, 2026

Assembler for NVIDIA Maxwell architecture

Sass 1,060 172 Updated Jan 3, 2023

A JIT assembler for x86/x64 architectures supporting FPU, MMX, SSE (1-4), AVX (1-2, 512), APX, and AVX10.2

C++ 2,215 301 Updated Jan 22, 2026

Compiler Infrastructure for Neural Networks

C++ 147 114 Updated Jul 18, 2023

Deploy your model with TensorRT quickly.

C++ 764 100 Updated Nov 21, 2023

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 19,047 3,660 Updated Jan 22, 2026

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ 4,668 749 Updated Jul 29, 2024

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

Cuda 578 94 Updated Jul 18, 2025

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,808 550 Updated Oct 24, 2024

a language for fast, portable data-parallel computation

C++ 6,536 1,095 Updated Jan 21, 2026

Assembler for NVIDIA Volta and Turing GPUs

Python 238 41 Updated Jan 13, 2022

A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

C++ 25 7 Updated Feb 9, 2021

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,995 760 Updated Feb 8, 2024

It's like cat, but for images.

C 914 35 Updated Oct 21, 2025

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Jupyter Notebook 10,103 1,323 Updated Nov 9, 2023

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 13,970 2,175 Updated Jan 22, 2026

Kernel Fusion and Runtime Compilation Based on NNVM

C++ 72 25 Updated Nov 21, 2016

A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory

C++ 299 76 Updated Nov 28, 2018

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Python 1,334 186 Updated Jul 8, 2025

Subpart source code of of deepcore v0.7

C 27 14 Updated Jun 28, 2020

Reference implementation of real-time autoregressive wavenet inference

Cuda 745 126 Updated Jan 19, 2021

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Python 2,988 949 Updated Jul 6, 2023

Facebook AI Research's Automatic Speech Recognition Toolkit

C++ 6,446 1,001 Updated Jan 12, 2026

CUDA Tensor Transpose (cuTT) library

C++ 53 29 Updated Aug 10, 2017

Convolutional Neural Networks

C 26,428 21,232 Updated May 3, 2024
Next