4 papers across 2 sessions
We present surrogate regret upper bounds for online structured prediction with bandit and delayed feedback.
We present regret bounds for adversarial contextual bandits with general function approximation under delayed bandit feedback.
We propose the first Best-of-Both-Worlds algorithm for multi-armed bandits with adversarial delays that matches lower bounds in both stochastic and adversarial settings, significantly improving previous results.
This study raises and addresses the problem of time-delayed feedback in learning in games.