danthe3rd

dan_the_3rd danthe3rd

58 followers · 1 following

Achievements

x3 x4

Achievements

x3 x4

Stars

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 362 31 Updated Oct 16, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,991 1,886 Updated Oct 23, 2025

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 640 53 Updated Oct 11, 2025

facebookresearch / fastgen

Simple high-throughput inference library

Python 149 10 Updated May 14, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,667 969 Updated Oct 29, 2025

AppZung / react-native-code-push

React Native module for AppZung CodePush

C 101 14 Updated Oct 14, 2025

deepseek-ai / DeepSeek-R1

91,408 11,770 Updated Jun 27, 2025

LowikC / rsu

Tool to help tax declaration of RSUs (French Tax Code)

Python 8 1 Updated Sep 1, 2025

yuezhouhu / 2by4-pretrain

Efficient 2:4 sparse training algorithms and implementations

Python 57 1 Updated Dec 8, 2024

gyroflow / gyroflow

Video stabilization using gyroscope data

Rust 7,897 363 Updated Oct 29, 2025

graphcore-research / unit-scaling

A library for unit scaling in PyTorch

Jupyter Notebook 132 12 Updated Jul 11, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,851 190 Updated Oct 24, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,860 533 Updated Oct 28, 2025

databricks / megablocks

Python 1,470 216 Updated Jun 26, 2025

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,042 729 Updated Oct 28, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 922 77 Updated Sep 4, 2024

wangsiping97 / FastGEMV

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 116 7 Updated Jul 13, 2024

state-spaces / mamba

Mamba SSM architecture

Python 16,241 1,475 Updated Oct 10, 2025

sail-sg / zero-bubble-pipeline-parallelism

Forked from NVIDIA/Megatron-LM

Zero Bubble Pipeline Parallelism

Python 433 32 Updated May 7, 2025

cbh123 / narrator

David Attenborough narrates your life

Python 4,408 546 Updated Oct 2, 2025

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook 11,792 1,109 Updated Aug 17, 2025

neonsecret / xformers

Forked from yocabon/xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8 Updated Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dan_the_3rd danthe3rd

Achievements

Achievements

Block or report danthe3rd

Stars

NVIDIA / nvshmem

openai / gpt-oss

Dao-AILab / quack

facebookresearch / fastgen

deepseek-ai / DeepEP

AppZung / react-native-code-push

deepseek-ai / DeepSeek-R1

LowikC / rsu

yuezhouhu / 2by4-pretrain

gyroflow / gyroflow

graphcore-research / unit-scaling

HazyResearch / ThunderKittens

NVIDIA / TransformerEngine

databricks / megablocks

facebookresearch / xformers

IST-DASLab / marlin

wangsiping97 / FastGEMV

state-spaces / mamba

sail-sg / zero-bubble-pipeline-parallelism

cbh123 / narrator

facebookresearch / dinov2

neonsecret / xformers

facebookincubator / AITemplate

huggingface / diffusers

ezyang / ghstack

triton-lang / triton

NVIDIA / aistore

NVIDIA / FasterTransformer

neuralmagic / deepsparse

neuralmagic / sparseml