Large Reasoning Models (LRMs)

1 paper across 1 session

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

#1405 · Wonje Jeung, Yoon Sangyeon, Minsuk Kahng, Albert No

We propose SAFEPATH, a lightweight method that aligns Large Reasoning Models to detect and suppress harmful chain-of-thought reasoning by injecting a brief safety signal at the start of reasoning.