PhD student, The Hong Kong University of Science and Technology
1 paper at NeurIPS 2025
Transformers can learn self-verifying reflection without language, and reinforcement learning enhances performance through shallow statistical patterns.