Assistant Professor, Purdue University
3 papers at NeurIPS 2025
We study the value of stochastic predictions in online optimal control with random disturbances.
This work presents the first finite-time analysis of average-reward Q-learning with asynchronous updates based on a single trajectory of Markovian samples.
We introduce a Bayesian value framework and a Bellman–Jensen Gap analysis to rigorously quantify and exploit imperfect multi‐step transition predictions, and present BOLA, the first provable sample‐efficient algorithm for RL with predictions.