Highlights
- Pro
Stars
Implementation of the paper--SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
Language model alignment-focused deep learning curriculum
Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses
Safety at Scale: A Comprehensive Survey of Large Model Safety
An interactive HTML pretty-printer for machine learning research in IPython notebooks.
🤗 smolagents: a barebones library for agents that think in code.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
A python sdk for LLM finetuning and inference on runpod infrastructure
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embeddings recursively. This helps us understand user behaviour on…
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Code for reproducing sections 4 and 6.2 of the paper "Obfuscated Activations Bypass LLM Latent-Space Defenses"
LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models. *Check out demo at* https://huggingface.co/spaces/facebook/l…
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
A library for efficient patching and automatic circuit discovery.
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from easy questions to hard
Tips for Writing a Research Paper using LaTeX
Accompanies NeurIPS'24 poster "Dense Associative Memory Through the Lens of Random Features"
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
A survey on harmful fine-tuning attack for large language model
An Extensible Continual Learning Framework Focused on Language Models (LMs)