1 paper across 1 session
MLPs contain "channels to infinity" where pairs of neurons evolve to form gated linear units with diverging output weights, creating regions that appear like flat minima but actually have slowly decreasing loss value