Assistant Professor, University of Surrey
3 papers at NeurIPS 2025
We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.
In this paper, we introduce the Curse of Depth, a concept that re-introduces, explains, and addresses the recent observation in modern Large Language Models (LLMs) where deeper layers are much less effective than expected.
We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.