MS student, University of Freiburg, Albert-Ludwigs-Universität Freiburg
1 paper at NeurIPS 2025
This paper presents a holistic and approximate normalization approach that accelerates GPT training by up to 40% while eliminating the need for weight decay and learning rate warm-up.