3 papers across 2 sessions
We introduce sequence models that are equivariant to time-parameterized symmetries such as motion.
We show that recurrent-depth transformers can be scaled to be effective language models, with particularly strong gains through additional compute for reasoning tasks.