PhD student, Carnegie Mellon University
2 papers at NeurIPS 2025
We introduce a novel offline RL algorithm that leverages shortcut models to scale both training and inference.
Rather than directly learning a policy from expert demonstrations, we instead learn world and reward models, allowing us to search at test-time and recover from mistakes.