Postdoc, University of California, Berkeley
2 papers at NeurIPS 2025
Scalable, simple, and practical algorithm for model-based RL with regret bounds across several RL settings and experiments on state-based, visual control and hardware tasks.
We find that online multi-task RL with high-capacity value models leads to SOTA sample efficiency and performance