3 papers across 1 session
We discover model-generated constitutions and train language models to intrinsically self-correct their responses by using these principles; repeating this process iteratively enables self-improvement.
This paper introduces CorrectBench, the first comprehensive benchmark for systematically evaluating self-correction mechanisms in LLMs.