Researcher, Microsoft AI
3 papers at NeurIPS 2025
We introduce Alternating Gradient Flows, a framework modeling feature learning in two-layer networks with small initialization as utility maximization and cost minimization—unifying saddle-to-saddle analyses and explaining Fourier feature emergence
We propose an informed corrector for masked discrete diffusion that reduces approximation errors, enabling faster sampling and better sample quality in both synthetic and large-scale settings.
We show that limiting a model's confidence during training can improve test-time scaling in mathematical reasoning.