1 paper across 1 session
Large neural networks first learn low dimensional feature representation then overfit the data and revert to a kernel regime.