3 papers across 3 sessions
This paper introduces CorrectBench, the first comprehensive benchmark for systematically evaluating self-correction mechanisms in LLMs.