Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View zlsh80826's full-sized avatar
🏠
Working from home
🏠
Working from home
  • NVIDIA
  • Taipei, Taiwan

Block or report zlsh80826

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

DLRover: An Automatic Distributed Deep Learning System

Python 1,613 203 Updated Dec 25, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 3,215 271 Updated Dec 19, 2025

Large Context Attention

Python 755 52 Updated Oct 13, 2025

The Foundation for All Legate Libraries

C++ 233 63 Updated Dec 24, 2025

Convert .ninja_log files to chrome's about:tracing format.

Python 496 53 Updated Jun 5, 2024

My notes of Clean Code book

6,083 841 Updated Nov 26, 2023

C/C++ Performance Profiler

C++ 4,307 360 Updated Jan 31, 2025

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 368 74 Updated Dec 24, 2025

JAX-Toolbox

Python 369 68 Updated Dec 24, 2025

Useful shortcuts for bash/zsh

1,046 127 Updated Dec 23, 2025

Benchmarking unity builds on real c++ projects.

Shell 15 1 Updated Jul 5, 2020

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,031 588 Updated Dec 22, 2025

A Python framework for accelerated simulation, data generation and spatial computing.

Python 5,962 404 Updated Dec 24, 2025

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,987 770 Updated Dec 24, 2025

Task-based datasets, preprocessing, and evaluation for sequence models.

Python 591 59 Updated Nov 17, 2025
Python 2,926 333 Updated Dec 10, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 34,417 3,319 Updated Dec 25, 2025

A tool to classify and statistic GPU kernel information.

Python 10 Updated Jun 25, 2024

C++ Design Patterns

C++ 4,494 973 Updated May 12, 2024

📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/

C++ 25,256 3,086 Updated Aug 17, 2024

An efficient GPU resource sharing system with fine-grained control for Linux platforms.

C++ 87 31 Updated Mar 25, 2024
C++ 3 Updated Aug 12, 2020

NumPy and SciPy on Multi-Node Multi-GPU systems

Python 964 88 Updated Dec 24, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,508 2,295 Updated Dec 11, 2025

AddressSanitizer, ThreadSanitizer, MemorySanitizer

C 12,260 1,081 Updated Dec 12, 2025
Go 2 Updated Oct 11, 2025

CUDA Python: Performance meets Productivity

Cython 3,100 234 Updated Dec 24, 2025

Development repository for the Triton language and compiler

MLIR 17,933 2,469 Updated Dec 25, 2025
Next