2 papers across 1 session
We propose a sparse autoencoder that maps the semantics of vision and language representations into a unified concept set.