PhD student, University of Maryland, College Park
1 paper at NeurIPS 2025
We show that recurrent-depth transformers can be scaled to be effective language models, with particularly strong gains through additional compute for reasoning tasks.