1 paper across 1 session
We propose a novel RL-based MLLM post-training framework named RePIC for the personalized image captioning task. Our method significantly outperforms SFT-based baselines on multi-concept personalized image captioning.