Assistant Professor, University of Illinois at Chicago
2 papers at NeurIPS 2025
We design the first efficient, near-optimal regret algorithm for contextual dueling bandits using offline oracles, enabling scalable preference-based learning in RLHF and resolving a key open problem in AI alignment.
This paper develops stochastic dominance as an objective for imitation learning to provide stronger guarantees for demonstrators with differing preferences.