3 papers across 2 sessions
a self-supervised method that improves open-weight value models using state-transition dynamics, enabling reward-free, efficient search with performance comparable to search with costly large models and tree-based methods
A novel approach; Very practical