Researcher, Microsoft
3 papers at NeurIPS 2025
A hybrid architecture with linear pre-filling complexity and up-to10x higher throughput on decoding.
We only need one example for RLVR on LLMs to achieve significant improvement on math tasks