Researcher, Forschungszentrum Juelich GmbH
3 papers at NeurIPS 2025
We use scaling law derivation to compare open language-vision foundation models (CLIP, MaMMUT) and datasets (DataComp-1.4B, Re-LAION-1.4B, DFN-1.4B), identifying models and datasets that promise stronger scalability in the pre-training.
ChemPile is a large and diverse collection of chemical data for the study and development of chemical foundation models
This paper presents a holistic and approximate normalization approach that accelerates GPT training by up to 40% while eliminating the need for weight decay and learning rate warm-up.