Neel Nanda

Researcher, Google DeepMind

2 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning

#1014 · Julian Minder, Clément Dumas, Caden Juang, Bilal Chughtai, Neel Nanda

Using crosscoders (SAE variant) for chat-tuning concept identification, we diagnose spurious chat-only concepts arising from L1 loss artifacts and show BatchTopK robustly reveals genuine, interpretable ones.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval

#4615 · Constantin Venhoff, Ashkan Khakzar, Sonia Joseph, Philip Torr, Neel Nanda

VLMs often perform worse at recalling facts than their LLM backbones because visual representations are formed too late in the forward pass to trigger the LLMs factual recall circuit.