3 papers across 3 sessions
Transformers can learn self-verifying reflection without language, and reinforcement learning enhances performance through shallow statistical patterns.