Faculty Fellow, New York University
2 papers at NeurIPS 2025
We propose a toy model that shows how linear truth encodings can arise in language models.
We derive a closed-form solution for a linear erasure projection that preserves the covariance with the main task labels.