PhD student, Department of Computer Science, Cornell University
2 papers at NeurIPS 2025
We introduce a theoretically-grounded distributional RL algorithm for LLM post-training that demonstrates improvement upon prior work on both synthetic and mathematical reasoning tasks.