Researcher, Microsoft
3 papers at NeurIPS 2025
We propose a general reinforcement learning framework tailored for interleaved multimodal tasks by permutating image sequences to simulate varied positional relationships and explore more spatial and positional diversity