IBM Research – Tokyo - NeurIPS 2025

🏛 IBM Research – Tokyo

1 paper across 1 session

Poster Session 5

We propose an algorithm for estimating the best mean reward in a multi-armed bandit with asymptotically optimal, instance-adaptive sample complexity.