MS student, University of Cambridge
1 paper at NeurIPS 2025
Using crosscoders (SAE variant) for chat-tuning concept identification, we diagnose spurious chat-only concepts arising from L1 loss artifacts and show BatchTopK robustly reveals genuine, interpretable ones.