PhD student, University of California, Berkeley
1 paper at NeurIPS 2025
We developed new methods to refine and falsify sparse autoencoder feature explanations, yielding higher-quality interpretability of large language models.