1 paper across 1 session
We propose the first Thompson Sampling algorithm with Pareto regret guarantees in multi-objective linear contextual bandit.