PhD student, Georgia Institute of Technology
1 paper at NeurIPS 2025
a self-supervised method that improves open-weight value models using state-transition dynamics, enabling reward-free, efficient search with performance comparable to search with costly large models and tree-based methods