1 paper across 1 session
We propose an adaptive layer reuse technique that dynamically reuse intermediate feature across adjacent denoising steps to enable efficient inference of text-to-video generation models