Poster Session 6 · Friday, December 5, 2025 4:30 PM → 7:30 PM
#3015
Optimal Regret of Bandits under Differential Privacy
Abstract
As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under -global Differential Privacy (DP) has been widely studied. The present literature poses a significant gap between the best-known regret lower and upper bound in this setting, though they "match in order". Thus, we revisit the regret lower and upper bounds of -global DP bandits and improve both.
First, we prove a tighter regret lower bound involving a novel information-theoretic quantity characterising the hardness of -global DP in stochastic bandits. This quantity smoothly interpolates between Kullback–Leibler divergence and Total Variation distance, depending on the privacy budget . Then, we choose two asymptotically optimal bandit algorithms, i.e., KL-UCB and IMED, and propose their DP versions using a unified blueprint, i.e.,
- running in arm-dependent phases, and
- adding Laplace noise to achieve privacy.
For Bernoulli bandits, we analyse the regrets of these algorithms and show that their regrets asymptotically match our lower bound up to a constant arbitrary close to 1. At the core of our algorithms lies a new concentration inequality for sums of Bernoulli variables under Laplace mechanism, which is a new DP version of the Chernoff bound.
Finally, our numerical experiments validate that DP-KLUCB and DP-IMED achieve lower regret than the existing -global DP bandit algorithms.