1 paper across 1 session
We propose the first prior-free algorithm that achieves near-optimal dynamic regret for non-stationary multi-armed bandits under constrained feedback.