Full Professor, Massachusetts Institute of Technology
2 papers at NeurIPS 2025
Theoretical study of impact of normalization layers in evolution of tokens representations as they propagate through layers of a transformer.