PhD student, Boston University, Boston University
2 papers at NeurIPS 2025
We introduce a new online RLHF algorithm that for the first time achieves a sample complexity that scales polynomially with the reward scale.