Researcher, Amazon
1 paper at NeurIPS 2025
We propose a compute- and memory-efficient inference framework for handling extremely long inputs with pre-trained Transformers.