1 paper across 1 session
A data construction pipeline and a DiT framework for subject-driven video customization under multimodal control conditions