logo
today local_bar
Poster Session 3 · Thursday, December 4, 2025 11:00 AM → 2:00 PM
#5501 Spotlight

Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback

NeurIPS OpenReview

Abstract

We present regret minimization algorithms for the contextual multi-armed bandit (CMAB) problem over actions in the presence of delayed feedback, a scenario where loss observations arrive with delays chosen by an adversary. As a preliminary result, assuming direct access to a finite policy class we establish an optimal expected regret bound of where is the sum of delays.
For our main contribution, we study the general function approximation setting over a (possibly infinite) contextual loss function class with access to an online least-square regression oracle over . In this setting, we achieve an expected regret bound of assuming FIFO order, where is the maximal delay, is an upper bound on the oracle's regret and is a stability parameter associated with the oracle.
We complement this general result by presenting a novel stability analysis of a Hedge-based version of Vovk's aggregating forecaster as an oracle implementation for least-square regression over a finite function class and show that its stability parameter is bounded by , resulting in an expected regret bound of which is a factor away from the lower bound of that we also present.