Lecturer, Imperial College London
3 papers at NeurIPS 2025
We propose a novel regret analysis of a simple policy gradient algorithm for bandits, characterizing regret regimes depending on its learning rate.
We study the stochastic shortest path problem with sparse adversarial costs and under known transitions characterize the minimax regret achieved by OMD with a novel $\ell_r$-norm regularizer with $r \in [1,2]$.
Our work shows that in online convex optimization over lp-balls (p>2), anytime optimality can be achieved with Follow-the-Regularized-Leader using adaptive regularization, and that for separable regularizers this adaptivity is necessary.