Research Scientist, Google
2 papers at NeurIPS 2025
We propose a new framework and set of evaluation criteria to assess the utility of text embeddings used in data selection for pretraining langauge models.