Eluder dimension: localise it!

Alireza Bakhtiari, Alex Ayoub, Samuel McLaughlin Robertson, David Janz, Csaba Szepesvari

University of Alberta· Oxford University

eluder dimension bandits reinforcement learning

Abstract

We establish a lower bound on the eluder dimension in generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds.

To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.