2 papers across 2 sessions
we indentified a new and interesting property of nonlinear activations: better feature separation for similar inputs and better NTK conditioning
We prove quantitative convergence estimates for single layer neural networks in the NTK regime to gaussian processes at positive training time