2 papers across 2 sessions
We analyze the effective depth of LLMs and find that they are unlikely to compose subresults, and deeper models spread out the same type of computation as the shallow ones.