Undergrad student, University of Electronic Science and Technology of China
2 papers at NeurIPS 2025
A novel alignment framework that integrates generative reward models with multi-modal RLHF.
A human preference dataset for multi-turn interleaved multimodal understanding and generatin tasks