2 papers across 2 sessions
We derive no-regret guarantees for Thompson sampling in episodic reinforcement learning with Gaussian process modelling.
This paper presents a tunable algorithm for online convex optimization with adversarial constraints that significantly reduces cumulative constraint violation below $O(\sqrt{T})$ by trading it off with regret.