PhD student, City University of Hong Kong
2 papers at NeurIPS 2025
We develop a reasoning-induced NR-IQA model via reinforcement learning to rank.
DP²O-SR post-trains generative SR models to better match human perceptual preferences, by optimizing over diverse outputs (sampled only via noise) using IQA-based rewards, without requiring human annotations during training.