8 papers across 3 sessions
We propose an explainable and extendable framework to enhance deepfake detection via multimodal large-language models.
We introduce SPRO (Self-Play Reward Optimization), an annotation-free framework that aligns images with human preferences by using vision-language models and reward signals to optimize prompts and images via self-play.