3 papers across 3 sessions
We present the first theoretical analysis of PbRL with ranking feedback, showing that longer ranking feedback can provably improve sample efficiency.
We propose Diff-UAPA, a novel framework that aligns diffusion policies with human preferences by integrating uncertainty-aware objectives and MAP estimation.