On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

University of Texas at Austin· University of Arizona· Google

bilevel optimization nonconvex optimization

Abstract

In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective—such as convexity or the Polyak–Łojasiewicz (PL) condition—guaranteeing global optimality is generally intractable.

Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity.

Specifically, to reach an

(ϵ_{f}, ϵ_{g})

-stationary point—where

ϵ_{f}

and

ϵ_{g}

denote the target stationarity accuracies for the upper- and lower-level objectives, respectively—the considered method achieves a complexity of

O (max (ϵ_{f}^{- \frac{3 + p}{1 + p}}, ϵ_{g}^{- \frac{3 + p}{2}}))

, where

p \geq 0

is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.