Researcher, INRIA
2 papers at NeurIPS 2025
We propose a novel regret analysis of a simple policy gradient algorithm for bandits, characterizing regret regimes depending on its learning rate.
We study Bandit Convex Optimization in non-stationary environments by establishing upper and lower regret bounds