Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View zhxfl's full-sized avatar

Organizations

@PaddlePaddle

Block or report zhxfl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,367 1,347 Updated Jul 9, 2025

A lightweight design for computation-communication overlap.

Cuda 207 9 Updated Dec 25, 2025

An easy to use PyTorch to TensorRT converter

Python 4,840 696 Updated Aug 17, 2024

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,619 1,864 Updated Dec 20, 2025

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 958 387 Updated Dec 10, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 34,475 3,333 Updated Dec 31, 2025

Assembler for NVIDIA Maxwell architecture

Sass 1,058 172 Updated Jan 3, 2023

A JIT assembler for x86/x64 architectures supporting FPU, MMX, SSE (1-4), AVX (1-2, 512), APX, and AVX10.2

C++ 2,211 301 Updated Dec 29, 2025

Compiler Infrastructure for Neural Networks

C++ 147 114 Updated Jul 18, 2023

Deploy your model with TensorRT quickly.

C++ 765 100 Updated Nov 21, 2023

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 18,836 3,624 Updated Dec 31, 2025

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ 4,655 748 Updated Jul 29, 2024

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

Cuda 572 93 Updated Jul 18, 2025

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,806 550 Updated Oct 24, 2024

a language for fast, portable data-parallel computation

C++ 6,494 1,096 Updated Dec 28, 2025

Assembler for NVIDIA Volta and Turing GPUs

Python 236 40 Updated Jan 13, 2022

A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

C++ 25 7 Updated Feb 9, 2021

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,999 761 Updated Feb 8, 2024

It's like cat, but for images.

C 913 34 Updated Oct 21, 2025

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Jupyter Notebook 10,089 1,327 Updated Nov 9, 2023

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 13,829 2,152 Updated Dec 30, 2025

Kernel Fusion and Runtime Compilation Based on NNVM

C++ 72 26 Updated Nov 21, 2016

A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory

C++ 299 76 Updated Nov 28, 2018

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Python 1,334 186 Updated Jul 8, 2025

Subpart source code of of deepcore v0.7

C 27 14 Updated Jun 28, 2020

Reference implementation of real-time autoregressive wavenet inference

Cuda 744 126 Updated Jan 19, 2021

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Python 2,990 951 Updated Jul 6, 2023

Facebook AI Research's Automatic Speech Recognition Toolkit

C++ 6,444 1,001 Updated Nov 7, 2025

CUDA Tensor Transpose (cuTT) library

C++ 53 28 Updated Aug 10, 2017

Convolutional Neural Networks

C 26,419 21,235 Updated May 3, 2024
Next