Researcher, Ant Group
4 papers at NeurIPS 2025
We present ARGenSeg, a unified framework for multimodal understanding and pixel-level perception, achieving state-of-the-art performance of image segmentation.
This study presents the first comprehensive investigation into model merging and data mixture strategies for constructing large language models (LLMs) aligned with the 3H principles (Harmlessness, Helpfulness, Honesty).
We present LLaDA, a diffusion language model trained from scratch that is competitive to LLaMA 3 in performance.
This study introduces VADB, the largest video aesthetic database with 10,490 videos, and VADB-Net, a novel framework that outperforms existing models in video aesthetic assessment.