Dhanya Sridhar

Assistant Professor, Université de Montréal and Mila-Quebec AI Institute

1 paper at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Causal Differentiating Concepts: Interpreting LM Behavior via Causal Representation Learning

#2603 Spotlight · Navita Goyal, Hal Daumé III, Alexandre Drouin, Dhanya Sridhar

This paper introduces an unsupervised method that disentangles interpretable latent concepts in language model activations that mediate behavior, assuming that sparse changes in these concepts can induce changes in model behavior.