PhD student, Princeton University
1 paper at NeurIPS 2025
We show that transformers achieve length generalization when training on shorter main task and longer auxiliary tasks together.