3 papers across 3 sessions
A multi-scale, multi-fidelity Bayesian Optimization (BO) approach where {data mixtures, model scale, training steps} are adaptively selected, achieving >2.6x speedups compared to multi-fidelity BO and random search baselines.
We propose scaling laws that predict the loss of models when trained on a mixture of source domains.
Nemotron-CLIMB automates data mixture optimization for pre-training, improving domain adaptation and outperforming Llama-3.2-1B by 2.0% on general reasoning.