Assistant Professor, Mohamed bin Zayed University of Artificial Intelligence
1 paper at NeurIPS 2025
We find that transformer key-value memories are nearly as interpretable as SAE features