2 papers across 2 sessions
RoMAE is a single, drop-in masked autoencoder that utilizes continuous (axial) RoPE to excel in interpolation and representation learning across modalities.