Associate Professor, Shenzhen University of Advanced Technology
2 papers at NeurIPS 2025
We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.