Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Linking-ai/Awesome-LRMs-Safety-Reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

A curated paper list on safety in reasoning of Large Reasoning Models (LRMs). Research on safety in reasoning for LRMs is still in its early stages. This repository aims to track and document advancements in this evolving field. Stay tuned for updates!

Awesome License Static Badge

Content

Keywords Convention

Abbreviation

Conference

Main Features

Papers

Dataset and Systematical Study

  • SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Fengqing Jiang, Zhangchen Xu, Yuetai Li, Luyao Niu, Zhen Xiang, Bo Li, Bill Yuchen Lin, Radha Poovendran. [pdf] [repo] [data], 2025.2.17
  • Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Zachary Yahn, Yichang Xu, Ling Liu. [pdf] [repo], 2025.3.1

Comprehensive Analysis and Survey

  • The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Shreedhar Jangam, Jayanth Srinivasa, Gaowen Liu, Dawn Song, Xin Eric Wang. [pdf], 2025.2.27(v3)

Attack or Defense Methods

  • BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu. [pdf] [repo], 2025.2.16
  • Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking Junda Zhu, Lingyong Yan, Shuaiqiang Wang, Dawei Yin, Lei Sha. [pdf] [repo], 2025.2.18
  • H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking Martin Kuo, Jianyi Zhang, Aolin Ding, Qinsi Wang, Louis DiValentin, Yujia Bao, Wei Wei, Hai Li, Yiran Chen. [pdf] [repo], 2025.2.27(v2)
  • Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models Meghana Rajeev, Rajkumar Ramamurthy, Prapti Trivedi, Vikas Yadav, Oluwanifemi Bamgbose, Sathwik Tejaswi Madhusudan, James Zou, Nazneen Rajani. [pdf] [data], 2025.3.3

Evaluation and Guardrail

  • GuardReasoner: Towards Reasoning-based LLM Safeguards Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi. [pdf] [data], 2025.1.31

Other Work

  • Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning Ang Li, Yichuan Mo, Mingjie Li, Yifei Wang, Yisen Wang. [pdf], 2025.2.21(v2)
  • OverThink: Slowdown Attacks on Reasoning LLMs Abhinav Kumar, Jaechul Roh, Ali Naseh, Marzena Karpinska, Mohit Iyyer, Amir Houmansadr, Eugene Bagdasarian. [pdf], 2025.2.5(v2)
  • o3-mini vs DeepSeek-R1: Which One is Safer? Aitor Arrieta, Miriam Ugarte, Pablo Valle, José Antonio Parejo, Sergio Segura. [pdf], 2025.1.31(v2)
  • Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies Manojkumar Parmar, Yuvaraj Govindarajulu. [pdf], 2025.1.28
  • MetaSC: Test-Time Safety Specification Optimization for Language Models Víctor Gallego. [pdf], 2025.2.11
  • Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment Haoyu Wang, Zeyu Qin, Li Shen, Xueqian Wang, Minhao Cheng, Dacheng Tao. [pdf], 2025.2.6
  • Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models Rubing Li, João Sedoc, Arun Sundararajan. [pdf], 2025.2.6
  • Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models Artyom Kharinaev, Viktor Moskvoretskii, Egor Shvetsov, Kseniia Studenikina, Bykov Mikhail, Evgeny Burnaev. [pdf], 2025.2.18
  • Output Length Effect on DeepSeek-R1's Safety in Forced Thinking Xuying Li, Zhuo Li, Yuji Kosuga, Victor Bian. [pdf], 2025.3.2

Resources

Reading lists related to LLMs/LRMs safety:

Acknowledgements

  • We acknowledge that some important works in this field may be missing from this list. We warmly welcome contributions to help us improve!
  • If you would like to promote your work or suggest other relevant papers, please feel free to open an issue or submit a pull request(PR). Your contributions are greatly appreciated, and we thank you in advance for helping enhance this resource!
  • Special thanks to Awesome-Efficient-Reasoning, which inspired the structure and template of this project.

About

Paper list of safety in reasoning of Large Reasoning Models (LRMs).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •