2 papers across 2 sessions
We show that in contextual cascading bandits, regret vanishes as the cascade length grows, with nearly matching upper and lower bounds.
We develop DISCOVER, which enables RL agents to solve substantially more challenging tasks than previous exploration strategies in RL.