- Earth
-
16:06
(UTC +08:00) - https://www.linkedin.com/in/pmixer/
Highlights
- Pro
Stars
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
Offline optimization of your disaggregated Dynamo graph
A list of papers for Graph Retrieval-Augmented Generation (GraphRAG).
🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.
A song aesthetic evaluation toolkit trained on SongEval.
Minimalistic 4D-parallelism distributed training framework for education purpose
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
This package contains the original 2012 AlexNet code.
A Datacenter Scale Distributed Inference Serving Framework
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
A faster int-to-int hashmap implemented in C++.
Democratizing AlphaFold3: an PyTorch reimplementation to accelerate protein structure prediction
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
This is a Chinese translation of the CUDA programming guide
The Triton TensorRT-LLM Backend
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…