1 paper across 1 session
We present the first theoretical analysis of PbRL with ranking feedback, showing that longer ranking feedback can provably improve sample efficiency.