Assistant Professor, The Hong Kong University of Science and Technology (Guangzhou)
2 papers at NeurIPS 2025
We presents Reward Dithering, a technique that enhances reinforcement learning in large language models by adding random perturbations to reward signals, improving training efficiency and convergence speed while maintaining performance.