1 paper across 1 session
Larger vocabulary lowers language modeling difficulty by facilitating models to learn non-i.i.d patterns in text more easily