5 papers across 3 sessions
We show that deep neural networks across architectures and training conditions all instantiate the same abstract algorithm for modular addition.
We prove that the interaction of parameter symmetry and equivariance constraints can create critical points and minima in the loss landscape.