1 paper across 1 session
RoMAE is a single, drop-in masked autoencoder that utilizes continuous (axial) RoPE to excel in interpolation and representation learning across modalities.