Postdoc, Hong Kong Polytechnic University
2 papers at NeurIPS 2025
We propose a diversity-aware policy optimization method for LLM reasoning that introduces token-level diversity focusing on positive samples, achieving higher performance improvement on mathematical benchmarks while generating more diverse solutions.