Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring

Federico Di Gennaro, Khaled Eldowa, Nicolò Cesa-Bianchi

EPFL· Università degli Studi di Milano· Politecnico di Milano

Partial monitoring exploration-by-optimization linear bandits online learning

Abstract

In contrast to the classic formulation of partial monitoring, linear partial monitoring can model infinite outcome spaces, while imposing a linear structure on both the losses and the observations. This setting can be viewed as a generalization of linear bandits where loss and feedback are decoupled in a flexible manner.

In this work, we address a nonstochastic (adversarial), finite-actions version of the problem through a simple instance of the exploration-by-optimization method that is amenable to efficient implementation. We derive regret bounds that depend on the game structure in a more transparent manner than previous theoretical guarantees for this paradigm.

Our bounds feature instance-specific quantities that reflect the degree of alignment between observations and losses, and resemble known guarantees in the stochastic setting. Notably, they achieve the standard

T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">

rate in easy (locally observable) games and

T^{2/3}

in hard (globally observable) games, where

T

is the time horizon.

We instantiate these bounds in a selection of old and new partial information settings subsumed by this model, and illustrate that the achieved dependence on the game structure can be tight in interesting cases.