1 paper across 1 session
Theoretical study of impact of normalization layers in evolution of tokens representations as they propagate through layers of a transformer.