Principal Researcher, ELLIS Institute Tübingen
1 paper at NeurIPS 2025
We show that recurrent-depth transformers can be scaled to be effective language models, with particularly strong gains through additional compute for reasoning tasks.