Full Professor, National University of Singapore
3 papers at NeurIPS 2025
AnytimeReasoner optimizes LLM reasoning under variable token budgets by introducing verifiable dense rewards and a variance reduction method (BRPO), enabling more efficient RL for both final and anytime reasoning performance.
We study the approximation and generalization abilities of score-based neural network generative models