PhD student, Indraprastha Institute of Information Technology, Delhi
1 paper at NeurIPS 2025
We introduce SPRO (Self-Play Reward Optimization), an annotation-free framework that aligns images with human preferences by using vision-language models and reward signals to optimize prompts and images via self-play.