2 papers across 2 sessions
This paper proves that a block coordinate descent algorithm can train deep neural networks to global minima under certain activation functions, extending to ReLU via architectural modifications.
This study provides an information-theoretic analysis of discrete latent variables in VQ-VAEs, deriving a novel generalization error bound based on the complexity of the latent variables and encoder.