-
Intel Labs
- Zurich, Switzerland
- in/asiemieniuk
Stars
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Kimi K2 is the large language model series developed by Moonshot AI team
Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantia…
🇪🇺 💶 Generate e-invoices (E-Rechnung in German) conforming to EN16931 (Factur-X/ZUGFeRD, UBL, CII, XRechnung aka X-Rechnung) from LibreOffice Calc/Excel data or JSON.
Distributed Compiler based on Triton for Parallel Systems
Custom Bindings for Enzyme Automatic Differentiation Tool and Interfacing with JAX.
A Datacenter Scale Distributed Inference Serving Framework
Intel® Tensor Processing Primitives extension for Pytorch*
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
This is a repository listing companies which offer full-time remote jobs with Spanish contracts
A feature-rich command-line audio/video downloader
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Efficient Triton Kernels for LLM Training
A modern model graph visualizer and debugger
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Type in Morse code by repeatedly slamming your laptop shut
A set of short tests designated to check Intel GPU SW environment and ability to execute user-generated code.
The Linux Kernel Module Programming Guide (updated for 5.0+ kernels)
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
Ongoing research training transformer models at scale
The simplest, fastest repository for training/finetuning medium-sized GPTs.
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
triton-lang / triton-cpu
Forked from triton-lang/tritonAn experimental CPU backend for Triton
GIM: Learning Generalizable Image Matcher From Internet Videos (ICLR 2024 Spotlight)
PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolutions and Fused Deep Learning Primitives