Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

KTH· Université Paris-Saclay· CNRS· CentraleSupélec

Multi-armed bandits Structured bandits Non-convex optimization

⋅ NeurIPS ⋅ Project Page ⋅Poster ⋅OpenReview

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most

m

modes.

We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem.