Senior Research Scientist, Sea AI Lab
6 papers at NeurIPS 2025
AnytimeReasoner optimizes LLM reasoning under variable token budgets by introducing verifiable dense rewards and a variance reduction method (BRPO), enabling more efficient RL for both final and anytime reasoning performance.
We propose a lifelong safety alignment framework where a Meta-Attacker and Defender co-evolve to uncover and defend against unseen jailbreaking strategies.
FOA-Attack enhances targeted adversarial transferability to closed-source MLLMs by optimally aligning global and local features.
NoisyRollout boosts VLM reasoning by mixing clean and noisy inputs during RL, improving generalization with no extra cost.