- All languages
- ABAP
- Apex
- Assembly
- Astro
- AutoIt
- Batchfile
- Bicep
- C
- C#
- C++
- CMake
- COBOL
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Cuda
- Cython
- D
- Dart
- Dockerfile
- Emacs Lisp
- Erlang
- GAP
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Julia
- Jupyter Notebook
- Just
- Kotlin
- Lean
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Mojo
- Nim
- Nunjucks
- Nushell
- OCaml
- Objective-C
- Objective-C++
- PHP
- PLpgSQL
- Perl
- PowerShell
- Pug
- Python
- QML
- R
- Red
- Roff
- Ruby
- Rust
- SCSS
- SMT
- SQL
- Scala
- Scheme
- Shell
- Smarty
- Solidity
- Svelte
- Swift
- SystemVerilog
- TSQL
- TeX
- TypeScript
- Typst
- Vim Script
- Vue
- WDL
- YARA
- Zig
- mdsvex
Starred repositories
Learning to Learn in TensorFlow
Linux virtual machines, with a focus on running containers
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse environments
Large language models designed for formal theorem proving through tool-integrated reasoning.
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)
A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, …
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Unofficial WIP LoRa Finetuning repository for VibeVoice
Reducing spatial redundancy in video recognition. SOTA computational efficiency.
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
MAGI-1: Autoregressive Video Generation at Scale
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Codebase for Merging Language Models (ICML 2024)
800,000 step-level correctness labels on LLM solutions to MATH problems
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Official PyTorch implementation of TokenSet.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
Guide to using pre-trained large language models of source code
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Instruct-tune LLaMA on consumer hardware
Code and documentation to train Stanford's Alpaca models, and generate the data.