1 paper across 1 session
In this work, we show that vision foundation models such as DINOv2 can achieve fast convergence and maintain high robustness by applying data curriculum and integrating data augmentation in the frequency domain during pretraining.