logo
today local_bar
Poster Session 3 · Thursday, December 4, 2025 11:00 AM → 2:00 PM
#800

Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

NeurIPS OpenReview

Abstract

We consider a common case of the combinatorial semi-bandit problem, the -set semi-bandit, where the learner exactly selects arms from the total arms. In the adversarial setting, the best regret bound, known to be for time horizon , is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them.
This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the arms that rank among the smallest (estimated) loss with random perturbation.
In this paper, we show that FTPL with a Fréchet perturbation also enjoys the near optimal regret bound in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.