Principal Researcher, Research, Microsoft
1 paper at NeurIPS 2025
We show that transformers achieve length generalization when training on shorter main task and longer auxiliary tasks together.