Researcher, Google DeepMind
2 papers at NeurIPS 2025
Using crosscoders (SAE variant) for chat-tuning concept identification, we diagnose spurious chat-only concepts arising from L1 loss artifacts and show BatchTopK robustly reveals genuine, interpretable ones.
VLMs often perform worse at recalling facts than their LLM backbones because visual representations are formed too late in the forward pass to trigger the LLMs factual recall circuit.