tensor-cores

Star

Here are 8 public repositories matching this topic...

xlite-dev / ffpa-attn

Star

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

cuda attention sdpa mla mlsys tensor-cores flash-attention deepseek deepseek-v3 deepseek-r1 fused-mla flash-mla

Updated Feb 13, 2026
Cuda

xlite-dev / HGEMM

Star

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

cuda tensor-cores hgemm

Updated May 10, 2025
Cuda

tgautam03 / tGeMM

Star

General Matrix Multiplication using NVIDIA Tensor Cores

matrix-multiplication cuda-kernels gpu-computing nvidia-cuda nvidia-gpu gpu-programming sgemm cuda-programming tensor-cores nvidia-tensor-cores

Updated Jan 25, 2025
Cuda

LDRyan0 / Correlator-Bench

Star

A benchmarking framework for correlators of FX telescope arrays

cpp cuda radio-astronomy astronomy-instrumentation tensor-cores

Updated Oct 20, 2023
Cuda

High-performance CUDA kernels with step-by-step optimization, profiling, and analysis. A growing collection of GPU solutions demonstrating warp-level tuning, memory optimization, and Tensor Core acceleration.

gpu-acceleration cuda-programming tensor-cores leetgpu warp-reduction

Updated Nov 12, 2025
Cuda

Umer-Farooq-CS / MNIST-Classification

Star

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

benchmarking deep-learning parallel-computing cuda mnist neural-networks high-performance-computing gpu-acceleration profiling shared-memory openacc performance-optimization c-cpp nsight tensor-cores cuda-streams pinned-memory

Updated Sep 12, 2025
Cuda

WizardsForgeIo / sparsemma

Star

INT8 Sparse Tensor Core GEMM for PyTorch — built for Windows

windows gpu cuda inference pytorch nvidia sparse quantization gemm int8 ptx structured-sparsity tensor-cores vram-optimization

Updated Feb 16, 2026
Cuda

aye-shadow / neural-network-acceleration

Star

cuda gpu-acceleration tensor-cores

Updated Apr 20, 2025
Cuda

Improve this page

Add a description, image, and links to the tensor-cores topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-cores topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor-cores

Here are 8 public repositories matching this topic...

xlite-dev / ffpa-attn

xlite-dev / HGEMM

tgautam03 / tGeMM

LDRyan0 / Correlator-Bench

keneoneth / leet_gpu_solution

Umer-Farooq-CS / MNIST-Classification

WizardsForgeIo / sparsemma

aye-shadow / neural-network-acceleration

Improve this page

Add this topic to your repo