1 paper across 1 session
A new benchmark for assessing VLM’s capabilities in real-world video game code assurance tasks.