Associate Professor, Gwangju Institute of Science and Technology
1 paper at NeurIPS 2025
Using teacher value function and PBRS, propose a theoretically grounded method for preference distillation