PhD student, Mila - Quebec Artificial Intelligence Institute
1 paper at NeurIPS 2025
We prove that stochastic momentum can improve the scaling law exponents over SGD on power-law random features by selecting hyperparameters to properly depend on data dimension or model size.