2 papers across 2 sessions
We find that rerouting spurious shortcuts in adapter training enables robust disentanglement for text-to-image generation with adapters.
We propose a pipeline for detecting visual inconsistencies using visual correspondences based on disentangled diffusion model features