2 papers across 2 sessions
We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on n data points, when the input lies in very high dimension.
Our work shows that in online convex optimization over lp-balls (p>2), anytime optimality can be achieved with Follow-the-Regularized-Leader using adaptive regularization, and that for separable regularizers this adaptivity is necessary.