Full Professor, Tohoku University
1 paper at NeurIPS 2025
We find that transformer key-value memories are nearly as interpretable as SAE features