MS student, Yonsei University
1 paper at NeurIPS 2025
Variance scale-up of text-token embeddings in MM-DiTs before joint attention lets rare prompts emerge without retraining, extra data, or denoising tweaks, boosting text-to-image, video, and editing