Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View neory9771's full-sized avatar
  • PI Systems
  • London

Block or report neory9771

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 380 18 Updated Apr 13, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 732 55 Updated Aug 6, 2025

Formatron empowers everyone to control the format of language models' output with minimal overhead.

Python 232 8 Updated Jun 7, 2025

NeMo: a toolkit for conversational AI

Python 1 Updated Mar 21, 2024

TSCMamba: Mamba meets multi-view learning for time series classification

Python 17 1 Updated Apr 16, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,439 1,970 Updated Dec 21, 2025

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

Python 6,703 850 Updated Dec 21, 2025

Reformatted Alignment

JavaScript 112 7 Updated Sep 23, 2024

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 2,084 186 Updated Jun 30, 2025

[NeurIPS 2023] DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization

Jupyter Notebook 178 26 Updated Sep 29, 2024

A (still growing) paper list of Evolutionary Computation (EC) published in some (rather all) top-tier (and also EC-focused) journals and conferences. For EC-focused publications, only Parallel/Dist…

146 35 Updated Dec 17, 2025
Python 8,673 518 Updated Oct 9, 2024

Our model BUDDI learns the joint distribution of interacting people

Python 164 11 Updated Jul 16, 2024

Ultralytics YOLO 🚀

Python 50,178 9,690 Updated Dec 21, 2025

Development repository for the Triton language and compiler

MLIR 17,893 2,462 Updated Dec 21, 2025

Likelihood-free AMortized Posterior Estimation with PyTorch

Python 131 13 Updated Aug 21, 2024

Software and instructions for setting up and running a self-driving lab (autonomous experimentation) demo using dimmable RGB LEDs, an 8-channel spectrophotometer, a microcontroller, and an adaptive…

Jupyter Notebook 77 13 Updated Nov 24, 2025

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

Python 529 41 Updated Jun 12, 2025

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

Python 733 43 Updated Apr 10, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,161 395 Updated Jul 11, 2024

Universal LLM Deployment Engine with ML Compilation

Python 21,769 1,891 Updated Dec 11, 2025

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 19,989 1,670 Updated Nov 26, 2025

This repository contains a reading list of papers on Time Series Forecasting/Prediction (TSF) and Spatio-Temporal Forecasting/Prediction (STF). These papers are mainly categorized according to the …

3,012 258 Updated Dec 12, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,878 12,103 Updated Dec 21, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,201 746 Updated Dec 12, 2025

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).

Python 9,127 878 Updated Dec 21, 2025

Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)

Python 205 19 Updated May 20, 2024

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 4,392 325 Updated Dec 9, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,409 635 Updated Dec 20, 2025

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,298 295 Updated May 11, 2025
Next