PhD student, University of Texas, Austin
1 paper at NeurIPS 2025
We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.