1 paper across 1 session
Training LLMs to combine reasoning with external knowledge retrieval via RL without any supervised data on reasoning steps.