1 paper across 1 session
A multilingual issue resolving benchmark, Multi-SWE-bench, with 2,132 human-validated GitHub issues across 8 widely used programming languages