MS student, Tsinghua University, Tsinghua University
1 paper at NeurIPS 2025
We propose a general reinforcement learning framework tailored for interleaved multimodal tasks by permutating image sequences to simulate varied positional relationships and explore more spatial and positional diversity