?
today
local_bar
search
Large Language Models; Reinforcement Learning; Math Reasoning; Gradient Variance Minimization
1 paper across 1 session
Poster Session 5
1 paper
Friday, December 5, 2025 · 11:00 AM → 2:00 PM
Exhibit Hall C,D,E
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
star
#1904
·
Jiarui Yao, Yifan Hao, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang