PhD student, EPFL - EPF Lausanne
1 paper at NeurIPS 2025
OSKAR is a self-supervised multimodal model that predicts masked token latent features from video, skeleton, and text using momentum target encoders—outperforming specialized models across tasks without reconstruction or contrastive losses.