3 papers across 2 sessions
We study Bandit Convex Optimization in non-stationary environments by establishing upper and lower regret bounds
We propose a human-in-the-loop learning method that achieves faithful imitation via distribution alignment and adapts to evolving behavior using dynamic regret minimization.
We frame dynamic regret minimization as a static regret problem in an RKHS