Lists (13)
Sort Name ascending (A-Z)
Stars
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Formatron empowers everyone to control the format of language models' output with minimal overhead.
IbrahimAmin1 / NeMo
Forked from NVIDIA-NeMo/NeMoNeMo: a toolkit for conversational AI
TSCMamba: Mamba meets multi-view learning for time series classification
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Open-source observability for your GenAI or LLM application, based on OpenTelemetry
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
[NeurIPS 2023] DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization
A (still growing) paper list of Evolutionary Computation (EC) published in some (rather all) top-tier (and also EC-focused) journals and conferences. For EC-focused publications, only Parallel/Dist…
Our model BUDDI learns the joint distribution of interacting people
Development repository for the Triton language and compiler
Likelihood-free AMortized Posterior Estimation with PyTorch
Software and instructions for setting up and running a self-driving lab (autonomous experimentation) demo using dimmable RGB LEDs, an 8-channel spectrophotometer, a microcontroller, and an adaptive…
Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Universal LLM Deployment Engine with ML Compilation
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
This repository contains a reading list of papers on Time Series Forecasting/Prediction (TSF) and Spatio-Temporal Forecasting/Prediction (STF). These papers are mainly categorized according to the …
A high-throughput and memory-efficient inference and serving engine for LLMs
Hackable and optimized Transformers building blocks, supporting a composable construction.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
A fast inference library for running LLMs locally on modern consumer-class GPUs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation: