Researcher, Beijing Baichuan Intelligence Technology Co., Ltd.
1 paper at NeurIPS 2025
Training LLMs to combine reasoning with external knowledge retrieval via RL without any supervised data on reasoning steps.