Beyond $\tilde{O} (T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)$ Constraint Violation for Online Convex Optimization with Adversarial Constraints

Online convex optimization Regret bounds Learning with constraints

Abstract

We study Online Convex Optimization with adversarial constraints (COCO). At each round a learner selects an action from a convex decision set and then an adversary reveals a convex cost and a convex constraint function. The goal of the learner is to select a sequence of actions to minimize both regret and the cumulative constraint violation (CCV) over a horizon of length

T

. The best-known policy for this problem achieves

O (T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)

regret and

\tilde{O} (T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)

CCV. In this paper, we improve this by trading off regret to achieve substantially smaller CCV. This trade-off is especially important in safety-critical applications, where satisfying the safety constraints is non-negotiable.

Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves

\tilde{O} (d T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"> + T^{β})

regret and

\tilde{O} (d T^{1 - β})

CCV, where

d

is the dimension of the decision set and

β \in 0, 1

is a tunable parameter. We begin with a special case, called the Constrained Expert problem, where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose a computationally efficient policy for the Constrained Expert problem, that attains

O (T ln N http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"> + T^{β})

regret and

\tilde{O} (T^{1 - β} ln N)

CCV for

N

number of experts.

The original problem is then reduced to the Constrained Expert problem via a covering argument. Finally, with an additional

M

-smoothness assumption, we propose a computationally efficient first-order policy attaining

O (MT http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"> + T^{β})

regret and

\tilde{O} (M T^{1 - β})

CCV.