PhD student, Seoul National University
2 papers at NeurIPS 2025
We study the feature-perturbing exploration method applicable to various bandit settings, and prove that our randomized method achieves optimal regret guarantee.
We present the first theoretical analysis of PbRL with ranking feedback, showing that longer ranking feedback can provably improve sample efficiency.