Associate Professor, Hong Kong Polytechnic University
2 papers at NeurIPS 2025
A comprehensive benchmarking platform for Meta-Black-Box Optimization approches, which provides high-efficiency training/evaluation and flexible usages for potential users.
We propose a diversity-aware policy optimization method for LLM reasoning that introduces token-level diversity focusing on positive samples, achieving higher performance improvement on mathematical benchmarks while generating more diverse solutions.