1 paper across 1 session
We propose SAFEPATH, a lightweight method that aligns Large Reasoning Models to detect and suppress harmful chain-of-thought reasoning by injecting a brief safety signal at the start of reasoning.