Associate Professor, Kyoto University
3 papers at NeurIPS 2025
This paper aims to broaden the theoretical foundation of FTPL and emphasize the need for further investigation to better understand the behavior of FTPL in broader settings.
We propose an algorithm for estimating the best mean reward in a multi-armed bandit with asymptotically optimal, instance-adaptive sample complexity.