Associate Professor, New York University
1 paper at NeurIPS 2025
We propose a toy model that shows how linear truth encodings can arise in language models.