Researcher, Microsoft
5 papers at NeurIPS 2025
A hybrid architecture with linear pre-filling complexity and up-to10x higher throughput on decoding.
We adopt reinforcement learning to train LLMs to generate quality code with rewards derived from program analysis.
We only need one example for RLVR on LLMs to achieve significant improvement on math tasks