3 papers across 2 sessions
This paper introduces an efficient temporal in‑context fine‑tuning framework that enables pretrained video diffusion models to ingest diverse conditioning signals, granting precise and versatile control without any architectural changes.
We propose a method to speedup video diffusion generation through efficient attention.