4 papers across 3 sessions
We introduce the "DataRater", a meta-learning approach to automatically learn the value of data and use it to improve the compute efficiency of training foundation models.
A synthetic cyclic peptide-protein complex dataset derived from AFDB, facilitating training cyclic peptide binder design model from scratch for the first time.
We introduce TabArena, the first continuously maintained living benchmarking system for machine learning on tabular data.
Introducing a learned algorithm for curating datasets for CLIP pretraining that achieves state-of-the-art ImageNet accuracy on the DataComp benchmark.