4 papers across 3 sessions
We show that in contextual cascading bandits, regret vanishes as the cascade length grows, with nearly matching upper and lower bounds.
We propose a computationally tractable multinomial logit contextual bandit algorithm, which is designed to handle generic non-linear parametric utility functions.
We present an algorithm for test-time scaling of SDE-based diffusion models by searching for noise trajectories which optimize arbitrary rewards, empirically matching/exceeding MCTS performance.