Professor, Cornell University
1 paper at NeurIPS 2025
We introduce a theoretically-grounded distributional RL algorithm for LLM post-training that demonstrates improvement upon prior work on both synthetic and mathematical reasoning tasks.