Associate Professor, University of Virginia, Charlottesville
2 papers at NeurIPS 2025
This work shows that greedy sampling based on empirical estimates is provably efficient for RLHF, under both the general preference model and the Bradley-Terry model.
This paper proposes a new single-loop first-order algorithm for solving linearly constrained bilevel optimizationm problem.