Researcher, Nebius AI
1 paper at NeurIPS 2025
We automatically collect software engineering tasks from github at scale, build a decontaminated SWE agent benchmark out of them and discover contamination in some well-known LLMs.