1 paper across 1 session
We derive high probability excess risk bounds to at most $\tilde{O}(1/n^2)$ for ERM, GD and SGD and our high probability results on the generalization error of gradients for nonconvex problems are also the sharpest.