PhD student, University of Hong Kong
2 papers at NeurIPS 2025
This work utilizes vision foundation models to construct a visual tokenizer, which is trained in an end-to-end manner for AR image generation, achieving state-of-the-art results on the $256\times256$ class-to-image generation task on ImageNet.
This paper introduces a unified visual tokenizer to facilitate unification of visual generation and understanding within a single autoregressive framework.