PhD student, Institute of Automation Chinese Academy of Sciences
2 papers at NeurIPS 2025
We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.