1 paper across 1 session
We propose a novel architecture and training objective specifically designed to upsample features from foundation vision encoders at any resolution.