1 paper across 1 session
Training a new reasoning paradigm of LLMs explicitly contains meta-thinking in a multi-agent and multi-turn setting with RL