PhD student, ELLIS Institute Tübingen
1 paper at NeurIPS 2025
This paper presents a holistic and approximate normalization approach that accelerates GPT training by up to 40% while eliminating the need for weight decay and learning rate warm-up.