3 papers across 2 sessions
We identify dimensional collapse in VQVAEs, where the codebook's effective dimensionality is surprisingly low, and investigate its implications and potential remedies.
Theoretical study of impact of normalization layers in evolution of tokens representations as they propagate through layers of a transformer.