2 papers across 2 sessions
We propose the first prior-free algorithm that achieves near-optimal dynamic regret for non-stationary multi-armed bandits under constrained feedback.
We study the problem of non-stationary Lipschitz bandits and achieve minimax optimal rate without knowledge of the non-stationarity.