Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite

(1 + ϵ)

-absolute central moment bounded by

υ

for some

ϵ \in (0, 1

. We improve both upper and lower bounds on the minimax regret compared to prior work. When

υ = O (1)

, the best prior known regret upper bound is

\tilde{O} (d T^{\frac{1}{1 + ϵ}})

. While a lower with the same scaling has been given, it relies on a construction using

υ = d

, and adapting the construction to the bounded-moment regime with

υ = O (1)

yields only a

Ω (d^{\frac{ϵ}{1 + ϵ}} T^{\frac{1}{1 + ϵ}})

lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being

d http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">

below the optimal rate in the finite-variance case (

ϵ = 1

We propose a new elimination-based algorithm guided by experimental design, which achieves regret

\tilde{O} (d^{\frac{1 + 3 ϵ}{2 ( 1 + ϵ )}} T^{\frac{1}{1 + ϵ}})

, thus improving the dependence on

d

for all

ϵ \in (0, 1)

and recovering a known optimal result for

ϵ = 1

. We also establish a lower bound of

Ω (d^{\frac{2 ϵ}{1 + ϵ}} T^{\frac{1}{1 + ϵ}})

, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems.

For finite action sets of size

n

, we derive upper and lower bounds of

\tilde{O} (d http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"> (lo g n)^{\frac{ϵ}{1 + ϵ}} T^{\frac{1}{1 + ϵ}})

and

\tilde{Ω} (d^{\frac{ϵ}{1 + ϵ}} (lo g n)^{\frac{ϵ}{1 + ϵ}} T^{\frac{1}{1 + ϵ}})

, respectively.

Finally, we provide action-set-dependent regret upper bounds and show that for some geometries, such as

l_{p}

-norm balls for

p \leq 1 + ϵ

, we can further reduce the dependence on

d