Stars
Arena-Hard-Auto: An automatic LLM benchmark.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
PaddleFormers is an easy-to-use library of pre-trained large language model zoo based on PaddlePaddle.
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
[EMNLP 2025] OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
verl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Minimal reproduction of DeepSeek R1-Zero
AgentScope: Agent-Oriented Programming for Building LLM Applications
Let your Claude able to think
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
Awesome-LLM-Prompt-Optimization: a curated list of advanced prompt optimization and tuning methods in Large Language Models
Evaluate your LLM's response with Prometheus and GPT4 💯
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
🔥中文 prompt 精选🔥,ChatGPT 使用指南,提升 ChatGPT 可玩性和可用性!🚀
LangGPT: Empowering everyone to become a prompt expert! 🚀 📌 结构化提示词(Structured Prompt)提出者 📌 元提示词(Meta-Prompt)发起者 📌 最流行的提示词落地范式 | Language of GPT The pioneering framework for structured & meta-prompt…
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
Layer-wise Analysis of Bert Model for Sentiment Analysis