Postdoc, Cornell University
3 papers at NeurIPS 2025
We introduce a novel offline RL algorithm that leverages shortcut models to scale both training and inference.
We introduce a new online RLHF algorithm that for the first time achieves a sample complexity that scales polynomially with the reward scale.
We incentivize truthful data sharing using a novel technique inspired by two sample testing.