1 paper across 1 session
We derive no-regret guarantees for Thompson sampling in episodic reinforcement learning with Gaussian process modelling.