Undergrad student, Korea Advanced Institute of Science & Technology
1 paper at NeurIPS 2025
Rather than directly learning a policy from expert demonstrations, we instead learn world and reward models, allowing us to search at test-time and recover from mistakes.