1 paper across 1 session
We propose a co-evolving reinforcement learning method that jointly optimizes the coder and unit tester without relying on ground-truth code supervision.