1 paper across 1 session
Bipartite mutual information in natural text exhibits sub-volume growth; from this, we prove a lower bound on how the history state must scale, setting a necessary condition for architectures to be effective at long-context language modeling.