Statistical Guarantees for High-Dimensional Stochastic Gradient Descent

Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu

University of Chicago· University of California San Diego· University of Twente

stochastic gradient descent high dimension constant learning rate geometric-moment contraction tail probability

Abstract

Stochastic Gradient Descent (SGD) and its Ruppert–Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood.

In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates.

Building on this, we derive the

q

-th moment convergence of SGD and ASGD for any

q \geq 2

in general

ℓ^{s}

-norms, and, in particular, the

ℓ^{\infty}

-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD.

Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.