2 papers across 2 sessions
This paper establishes the convergence rate $\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(x^k)\|_1\right]\leq O(\frac{\sqrt{d}C}{K^{1/4}})$ for AdamW.