PhD student, Korea Advanced Institute of Science & Technology
1 paper at NeurIPS 2025
Using teacher value function and PBRS, propose a theoretically grounded method for preference distillation