1 paper across 1 session
We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.