1 paper across 1 session
We study the feature-perturbing exploration method applicable to various bandit settings, and prove that our randomized method achieves optimal regret guarantee.