VP, Foundations of AI, International Business Machines
1 paper at NeurIPS 2025
We demonstrate scenarios where sparse attention based transformer models learn and generalize faster, and theoretically characterize conditions under which this occurs.