2 papers across 2 sessions
We present regret bounds for adversarial contextual bandits with general function approximation under delayed bandit feedback.