PhD student, Seoul National University
2 papers at NeurIPS 2025
We show that in contextual cascading bandits, regret vanishes as the cascade length grows, with nearly matching upper and lower bounds.
We present the first theoretical analysis of PbRL with ranking feedback, showing that longer ranking feedback can provably improve sample efficiency.