PhD student, Massachusetts Institute of Technology
1 paper at NeurIPS 2025
Unify supervised & reinforcement fine-tuning, and outperforms both of them. Together with theoretical justifications.