Reasoning Models Sometimes Output Illegible Chains of Thought

1 paper across 1 session

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

We find that reasoning traces of a RL-trained model often have illegible segments, potentially compromising chain-of-thought monitoring for detecting malicious behavior.