2 papers across 2 sessions
This paper provides an optimal, non-asymptotic uncertainty bound for kernel-based estimation assuming a general bound on the noise energy.
We introduce a Functional Scaling Law that predicts full SGD loss dynamics under arbitrary learning rate schedules.