Associate Professor, Indraprastha Institute of Information Technology, Delhi
2 papers at NeurIPS 2025
We introduce SPRO (Self-Play Reward Optimization), an annotation-free framework that aligns images with human preferences by using vision-language models and reward signals to optimize prompts and images via self-play.
Steering given distributions towards ideal distributions, where fairness and accuracy are not at a tradeoff.