Stars
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments
Code and Data for "Language Modeling with Editable External Knowledge"
Secrets of RLHF in Large Language Models Part I: PPO
Ongoing research training transformer models at scale
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).
A beautiful, simple, clean, and responsive Jekyll theme for academics
A library for efficient similarity search and clustering of dense vectors.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Code for fine-tuning Platypus fam LLMs using LoRA
Reference implementation for DPO (Direct Preference Optimization)
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
Code and documentation to train Stanford's Alpaca models, and generate the data.
Extract addresses and intents from tweet texts
A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Google Research
A modular RL library to fine-tune language models to human preferences
A Collection of BM25 Algorithms in Python
The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.
Code for the paper "Simulating Bandit Learning from User Feedback for Extractive Question Answering".