Assistant Professor, Harvard University
4 papers at NeurIPS 2025
We introduce a novel offline RL algorithm that leverages shortcut models to scale both training and inference.
We introduce a theoretically-grounded distributional RL algorithm for LLM post-training that demonstrates improvement upon prior work on both synthetic and mathematical reasoning tasks.