1 paper across 1 session
We introduce SPRO (Self-Play Reward Optimization), an annotation-free framework that aligns images with human preferences by using vision-language models and reward signals to optimize prompts and images via self-play.