-
AMD
- Houston, TX
- https://www.linkedin.com/in/pengsun86/
Stars
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
Online resources for Python Crash Course, 3rd edition, from No Starch Press.
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Accessible large language models via k-bit quantization for PyTorch.
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
A collection of libraries to optimise AI model performances
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
This repository contains demos I made with the Transformers library by HuggingFace.
openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
Models and examples built with TensorFlow
Benchmarks to capture important workloads.
Install guide of ROCm and Tensorflow on Ubuntu for the RX580
A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.
A cheatsheet of modern C++ language and library features.
Tool to run rccl-tests/nccl-tests based on from an application and gather performance.