Gradually-Warmup Learning Rate Scheduler for PyTorch
-
Updated
Oct 10, 2024 - Python
Gradually-Warmup Learning Rate Scheduler for PyTorch
Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)
Polynomial Learning Rate Decay Scheduler for PyTorch
A pyTorch Extension for Applied Mathematics
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch
Pytorch extension for openml-python
Template for CUDA / C++ extension writing with PyTorch
XNOR-Net with binary conv2d kernels with XNOR GEMM op, support both CPU and GPU.
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Pre-compiled custom CUDA extension for Block Sparse Attention (Python 3.11 / PyTorch 2.6.0+cu124).
Binding C++ to PyTorch and extending PyTorch
The numerical continuity layer for GPU computing
PyTorch extension for alternative backward rules and gradient transforms (STE, gradient jamming, non-standard activations).
simple examples of tools and libraries
High-performance matrix engine for Unit-Domain Flow (UDF). Eliminates Mantissa Friction with 0.00 MSE integrity.
CUDA kernels for LLM decode-stage inference, built as a PyTorch extension with correctness tests and latency benchmarks.
Add a description, image, and links to the pytorch-extension topic page so that developers can more easily learn about it.
To associate your repository with the pytorch-extension topic, visit your repo's landing page and select "manage topics."