logo
today local_bar
Poster Session 3 · Thursday, December 4, 2025 11:00 AM → 2:00 PM
#4304

Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

NeurIPS Project Page Slides OpenReview

Abstract

Instruction-based image editing enables precise modifications via natural language prompts, but existing methods face a precision-efficiency tradeoff: fine-tuning demands massive datasets (>10M) and computational resources, while training-free approaches suffer from weak instruction comprehension.
We address this by proposing ICEdit, which leverages the inherent comprehension and generation abilities of large-scale Diffusion Transformers (DiTs) through three key innovations:
  1. An in-context editing paradigm without architectural modifications;
  2. Minimal parameter-efficient fine-tuning for quality improvement;
  3. Early Filter Inference-Time Scaling, which uses VLMs to select high-quality noise samples for efficiency.
Experiments show that ICEdit achieves state-of-the-art editing performance with only 0.1% of the training data and 1% trainable parameters compared to previous methods. Our approach establishes a new paradigm for balancing precision and efficiency in instructional image editing.