PhD student, Massachusetts Institute of Technology
1 paper at NeurIPS 2025
Theoretical study of impact of normalization layers in evolution of tokens representations as they propagate through layers of a transformer.