Researcher, International Digital Economy Academy
1 paper at NeurIPS 2025
We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.