Lu Yin

Assistant Professor, University of Surrey

3 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 5

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

#4102 · Tianhao Chen, Xin Xu, Zijing Liu, Pengxiang Li, Xinyuan Song, AJAY KUMAR JAISWAL, Fan Zhang, Jishan Hu, Yang Wang, Hao Chen, Shizhe Diao, Shiwei Liu, Yu Li, Lu Yin, Can Yang

We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.

The Curse of Depth in Large Language Models

#4014 · Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

In this paper, we introduce the Curse of Depth, a concept that re-introduces, explains, and addresses the recent observation in modern Large Language Models (LLMs) where deeper layers are much less effective than expected.

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

#3405 · Di He, Songjun Tu, Ajay Jaiswal, Li Shen, Ganzhao Yuan, Shiwei Liu, Lu Yin

We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.