Professor, Harvard University
2 papers at NeurIPS 2025
We combine two types of memory systems from quadratic and linear transformers into a single hybrid memory system to leverage their complementary strengths in context coverage, precise retrieval, and expressivity.
We derive well-known learning rules from an objective that casts learning rules as policies for navigating uncertain loss landscapes