logo
today local_bar
Poster Session 2 · Wednesday, December 3, 2025 4:30 PM → 7:30 PM
#4315

Native-Resolution Image Synthesis

NeurIPS Project Page Slides Poster OpenReview

Abstract

We introduce native-resolution image synthesis, a novel paradigm in generative modeling capable of synthesizing images at arbitrary resolutions and aspect ratios. This approach overcomes the limitations of standard fixed-resolution, square-image methods by inherently handling variable-length visual tokens—a core challenge for conventional techniques.
To this end, we propose the Native-resolution diffusion Transformer (NiT), an architecture that explicitly models varying resolutions and aspect ratios within its denoising process. Unconstrained by fixed formats, NiT learns intrinsic visual distributions from images encompassing a wide range of resolutions and aspect ratios.
Notably, a single NiT model simultaneously achieves the state-of-the-art performance on both ImageNet-256x256 and 512x512 benchmarks. Surprisingly, akin to the robust zero-shot capabilities seen in advanced Large Language Models, NiT, pretrained solely on ImageNet, demonstrates excellent zero-shot generalization performance. It successfully generates high-fidelity images at previously unseen high resolutions (e.g., 1024x1024, 1536x1536) and diverse aspect ratios (e.g., 16:9, 3:1, 4:3), as shown in Figure 1.
These findings indicate the significant potential of native-resolution modeling as a bridge between visual generative modeling and advanced LLM methodologies.
Poster