2 papers across 2 sessions
This paper proves that self-verification prevents model collapse in recursive training without relying on real data.
We introduce a RL framework to train LLM's reasoning and self-verification ability simultaneously.