-
Cisco Systems
- Cary, North Carolina
-
02:29
(UTC -04:00) - in/bradenhelmer
Pinned Loading
-
nn_c
nn_c PublicHigh-performance CNN framework in C/CUDA: AVX-512 im2col convolution, warp-level shuffle reductions for softmax, fused optimizer kernels, and a 256-byte aligned workspace allocator for coalesced gl…
C
-
nvim-syncer
nvim-syncer PublicA lightweight Neovim plugin to sync files across hosts using Rsync.
Lua 2
-
custom-mpi-impl
custom-mpi-impl PublicMPI library implemented from scratch in C over Unix sockets: point-to-point send/recv, gather, broadcast, and barrier primitives built to spec without using any MPI library code.
C
-
cfd-lake
cfd-lake PublicCUDA/MPI parallel simulation of 2D wave propagation using centralized finite difference. Implementations across CUDA, multi-GPU+MPI, OpenMP, OpenACC, and Mojo. 20x speedup on large grids.
Cuda
-
LER-IR
LER-IR PublicMLIR compiler for loop redundancy elimination: implements the GLORE algorithm (OOPSLA '17) with a custom dialect, hand-written lexer/parser, and lowering pipeline through affine/scf/arith/memref to…
Java
-
llvm/llvm-project
llvm/llvm-project PublicThe LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
If the problem persists, check the GitHub status page or contact support.


