6 papers across 3 sessions
We present an efficient algorithm for linear contextual bandits with adversarial losses and stochastic action sets.
We study meta-learning in linear bandits and provide provably fast, sample-efficient algorithms to learn a common set of features from multiple related bandit tasks and to transfer this knowledge to new, unseen bandit tasks.
Methods based on Thompson Sampling for safe linear bandits that significantly improve computational costs while retaining regret and risk performance.