PhD student, Hong Kong Polytechnic University
1 paper at NeurIPS 2025
We propose a diversity-aware policy optimization method for LLM reasoning that introduces token-level diversity focusing on positive samples, achieving higher performance improvement on mathematical benchmarks while generating more diverse solutions.