PhD student, The Hong Kong University of Science and Technology
1 paper at NeurIPS 2025
We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.