1 paper across 1 session
We propose DistLCB, a multi-risk bandit algorithm for heavy-tailed rewards that leverages Wasserstein-based confidence bounds to achieve Pareto-optimality and provable regret guarantees.