1 paper across 1 session
We introduce a Bayesian value framework and a Bellman–Jensen Gap analysis to rigorously quantify and exploit imperfect multi‐step transition predictions, and present BOLA, the first provable sample‐efficient algorithm for RL with predictions.