1 paper across 1 session
Jointly modeling image laments and high-level semantic features (e.g. DINO), significantly enhances generative quality and training efficiency.