VP, Kuaishou Technology
6 papers at NeurIPS 2025
This paper presents a systematic pipeline for improving video generation with human feedback, including a large-scale preference dataset, a video reward model, and three alignment algorithms for flow matching models.
Leveraging the pre-trained diffusion model as a powerful and cost-effective step-level reward model to optimize the diffusion model itself directly in the noisy latent space.
We present an emotion-centric video foundation model trained with fine-grained captions and rationales via affective-tree reasoning guidance, achieving high-level emotional intelligence for video understanding.
We propose Flow-GRPO, the first method to integrate online RL into flow matching models, significantly enhancing text-to-image generation performance.
OmniSync enables universal lip synchronization for diverse visual content using mask-free diffusion with dynamic guidance.