Senior Applied Scientist, Research, Microsoft
1 paper at NeurIPS 2025
A hybrid architecture with linear pre-filling complexity and up-to10x higher throughput on decoding.