Associate Professor, University of Virginia, Charlottesville
2 papers at NeurIPS 2025
This work shows that greedy sampling based on empirical estimates is provably efficient for RLHF, under both the general preference model and the Bradley-Terry model.