Scientific Lead, Co-founder, LAION
2 papers at NeurIPS 2025
We use scaling law derivation to compare open language-vision foundation models (CLIP, MaMMUT) and datasets (DataComp-1.4B, Re-LAION-1.4B, DFN-1.4B), identifying models and datasets that promise stronger scalability in the pre-training.
This paper presents a holistic and approximate normalization approach that accelerates GPT training by up to 40% while eliminating the need for weight decay and learning rate warm-up.