1 paper across 1 session
Scaling neural networks leads to compositional generalization if the training distribution sufficiently covers the task space.