2 papers across 2 sessions
We show that slightly increasing transformers' depth with the input length increases their expressive power under standard complexity conjectures.
We exactly characterize the expressive power of transformers with padding tokens as $\mathsf{TC}^0$, and we also characterize transformers with looping and padding.