1 paper across 1 session
We propose a toy model that shows how linear truth encodings can arise in language models.