Assistant Professor, Hong Kong Polytechnic University
3 papers at NeurIPS 2025
A SoTA sequence parallelism for linear attention with a brand new collective communication.
We propose a diversity-aware policy optimization method for LLM reasoning that introduces token-level diversity focusing on positive samples, achieving higher performance improvement on mathematical benchmarks while generating more diverse solutions.