logo
today local_bar
Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#4414

Improved Training Technique for Shortcut Models

NeurIPS Project Page OpenReview

Abstract

Shortcut models represent a promising, non-adversarial paradigm for generative modeling, uniquely supporting one-step, few-step, and multi-step sampling from a single trained network. However, their widespread adoption has been stymied by critical performance bottlenecks.
This paper tackles the five core issues that held shortcut models back:
  1. the hidden flaw of compounding guidance, which we are the first to formalize, causing severe image artifacts;
  2. inflexible fixed guidance that restricts inference-time control;
  3. a pervasive frequency bias driven by a reliance on low-level distances in the direct domain, which biases reconstructions toward low frequencies;
  4. divergent self-consistency arising from a conflict with EMA training; and
  5. curvy flow trajectories that impede convergence.
To address these challenges, we introduce iSM, a unified training framework that systematically resolves each limitation. Our framework is built on four key improvements: Intrinsic Guidance provides explicit, dynamic control over guidance strength, resolving both compounding guidance and inflexibility. A Multi-Level Wavelet Loss mitigates frequency bias to restore high-frequency details. Scaling Optimal Transport (sOT) reduces training variance and learns straighter, more stable generative paths. Finally, a Twin EMA strategy reconciles training stability with self-consistency.
Extensive experiments on ImageNet 256x256 demonstrate that our approach yields substantial FID improvements over baseline shortcut models across one-step, few-step, and multi-step generation, making shortcut models a viable and competitive class of generative models.