Codestin Search App

A curated paper list on safety in reasoning of Large Reasoning Models (LRMs). Research on safety in reasoning for LRMs is still in its early stages. This repository aims to track and document advancements in this evolving field. Stay tuned for updates!

Content

Keywords Convention

Abbreviation

Conference

Main Features

Papers

Dataset and Systematical Study

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Fengqing Jiang, Zhangchen Xu, Yuetai Li, Luyao Niu, Zhen Xiang, Bo Li, Bill Yuchen Lin, Radha Poovendran. [pdf] [repo] [data], 2025.2.17
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Zachary Yahn, Yichang Xu, Ling Liu. [pdf] [repo], 2025.3.1

Comprehensive Analysis and Survey

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Shreedhar Jangam, Jayanth Srinivasa, Gaowen Liu, Dawn Song, Xin Eric Wang. [pdf], 2025.2.27(v3)

Attack or Defense Methods

BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu. [pdf] [repo], 2025.2.16
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking Junda Zhu, Lingyong Yan, Shuaiqiang Wang, Dawei Yin, Lei Sha. [pdf] [repo], 2025.2.18
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking Martin Kuo, Jianyi Zhang, Aolin Ding, Qinsi Wang, Louis DiValentin, Yujia Bao, Wei Wei, Hai Li, Yiran Chen. [pdf] [repo], 2025.2.27(v2)
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models Meghana Rajeev, Rajkumar Ramamurthy, Prapti Trivedi, Vikas Yadav, Oluwanifemi Bamgbose, Sathwik Tejaswi Madhusudan, James Zou, Nazneen Rajani. [pdf] [data], 2025.3.3

Evaluation and Guardrail

GuardReasoner: Towards Reasoning-based LLM Safeguards Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi. [pdf] [data], 2025.1.31

Other Work

Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning Ang Li, Yichuan Mo, Mingjie Li, Yifei Wang, Yisen Wang. [pdf], 2025.2.21(v2)
OverThink: Slowdown Attacks on Reasoning LLMs Abhinav Kumar, Jaechul Roh, Ali Naseh, Marzena Karpinska, Mohit Iyyer, Amir Houmansadr, Eugene Bagdasarian. [pdf], 2025.2.5(v2)
o3-mini vs DeepSeek-R1: Which One is Safer? Aitor Arrieta, Miriam Ugarte, Pablo Valle, José Antonio Parejo, Sergio Segura. [pdf], 2025.1.31(v2)
Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies Manojkumar Parmar, Yuvaraj Govindarajulu. [pdf], 2025.1.28
MetaSC: Test-Time Safety Specification Optimization for Language Models Víctor Gallego. [pdf], 2025.2.11
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment Haoyu Wang, Zeyu Qin, Li Shen, Xueqian Wang, Minhao Cheng, Dacheng Tao. [pdf], 2025.2.6
Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models Rubing Li, João Sedoc, Arun Sundararajan. [pdf], 2025.2.6
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models Artyom Kharinaev, Viktor Moskvoretskii, Egor Shvetsov, Kseniia Studenikina, Bykov Mikhail, Evgeny Burnaev. [pdf], 2025.2.18
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking Xuying Li, Zhuo Li, Yuji Kosuga, Victor Bian. [pdf], 2025.3.2

Resources

Reading lists related to LLMs/LRMs safety:

Acknowledgements

We acknowledge that some important works in this field may be missing from this list. We warmly welcome contributions to help us improve!
If you would like to promote your work or suggest other relevant papers, please feel free to open an issue or submit a pull request(PR). Your contributions are greatly appreciated, and we thank you in advance for helping enhance this resource!
Special thanks to Awesome-Efficient-Reasoning, which inspired the structure and template of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Content

Keywords Convention

Papers

Dataset and Systematical Study

Comprehensive Analysis and Survey

Attack or Defense Methods

Evaluation and Guardrail

Other Work

Resources

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

Linking-ai/Awesome-LRMs-Safety-Reasoning

Folders and files

Latest commit

History

Repository files navigation

Content

Keywords Convention

Papers

Dataset and Systematical Study

Comprehensive Analysis and Survey

Attack or Defense Methods

Evaluation and Guardrail

Other Work

Resources

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages