Postdoc, Princeton University
1 paper at NeurIPS 2025
This work shows that greedy sampling based on empirical estimates is provably efficient for RLHF, under both the general preference model and the Bradley-Terry model.