Associate Professor, Yonsei University
2 papers at NeurIPS 2025
We propose SAFEPATH, a lightweight method that aligns Large Reasoning Models to detect and suppress harmful chain-of-thought reasoning by injecting a brief safety signal at the start of reasoning.
We derive information-theoretic identities for discrete diffusion, revealing score-based losses as exact mutual information decompositions and enabling principled log-likelihood estimation.