2 papers across 1 session
We introduce rStar-Coder to train advanced code reasoning LLMs, with our 14B model achieving comparable performance to QWQ-32B.
Current LLM code evaluation is flawed by weak test cases; we propose SAGA, a novel method using human expertise to generate superior verifiers, demonstrated by our new CodeComPass benchmark and TCGCoder-7B model, for more reliable assessment.