1 paper across 1 session
Transformers can learn self-verifying reflection without language, and reinforcement learning enhances performance through shallow statistical patterns.