1 paper across 1 session
We provide theoretical analysis for forward and reverse KL-regularized RLHF under multiple reference models.