4 papers across 3 sessions
We show that multiplicative (bi-linear) hidden state transitions are a natural choice for representing state tracking behavior in linear recurrent networks.
We show that slightly increasing transformers' depth with the input length increases their expressive power under standard complexity conjectures.
We propose a contextualized position encoding using dynamic Householder matrices in place of static rotary ones, along with a hardware-efficient training algorithm that improves state tracking performance.
Structured Linear Controlled Differential Equations: A unifying framework for sequence models with structured, input-dependent state-transition matrices