PhD student, University of Notre Dame
1 paper at NeurIPS 2025
We prove precisely how deeper transformers (with appropriate rounding) become more expressive, and show that empirical behaviour tracks our theory.