2 papers across 2 sessions
We present the first theoretical analysis of PbRL with ranking feedback, showing that longer ranking feedback can provably improve sample efficiency.