Associate Professor, University of California Berkeley
3 papers at NeurIPS 2025
We developed new methods to refine and falsify sparse autoencoder feature explanations, yielding higher-quality interpretability of large language models.