1 paper across 1 session
Using teacher value function and PBRS, propose a theoretically grounded method for preference distillation