Highlights
Lists (3)
Sort Name ascending (A-Z)
Stars
Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"
A resource repository for representation engineering in large language models
An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.
DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
Open source replication of Anthropic's Crosscoders for Model Diffing
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
Awesome papers involving LLMs in Social Science.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
Align Anything: Training All-modality Model with Feedback
Scalable RL solution for advanced reasoning of language models
A collection of resources that investigate social agents.
Official code for "Goal-Conditioned On-Policy Reinforcement Learning" (NeurIPS 2024).
Official code for "Iterative Regularized Policy Optimization with Imperfect Demonstrations" (ICML2024).
Demonstrations generation and training scripts for fly-craft/VVCGym (ICML2024, ICLR2025, ICML2025).
An efficient goal-conditioned reinforcement learning environment for fixed-wing UAV velocity vector control based on Gymnasium (ICLR2025).
欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
This is the official repo for "PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization". PromptAgent is a novel automatic prompt optimization method that auton…
An Open-Source Package for Textual Adversarial Attack.
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Official github repo for AutoDetect, an automated weakness detection framework for LLMs.
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Master copies of the DISARM frameworks, with generated files to help you explore the data
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"