Postdoc, New York University
3 papers at NeurIPS 2025
We propose a toy model that shows how linear truth encodings can arise in language models.
We show that solving compositional reasoning problems requires transformers, RNNs, or CoT-augmented transformers to scale specific hyperparameters with input size, revealing distinct strengths and trade-offs across architectures.
We show that transformers with linear width can solve many graph problems using constant depth, revealing a trade-off where increasing width enables shallower, faster models—though some tasks still demand quadratic width.