1 paper across 1 session
We propose to replace self-attention layers with linear estimators through the derived CCA error bound, achieving inference speedups with favorable accuracy trade-off.