1 paper across 1 session
We fit scaling laws for large language models with varying width-to-depth ratios and parameter counts.