3 papers across 3 sessions
We investigate some basic questions about how neural networks learn and represent skills that are relevant to the problem of creating narrow AI systems.
Bipartite mutual information in natural text exhibits sub-volume growth; from this, we prove a lower bound on how the history state must scale, setting a necessary condition for architectures to be effective at long-context language modeling.