1 paper across 1 session
We find that transformer key-value memories are nearly as interpretable as SAE features