2 papers across 2 sessions
We introduce $\mu$PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets on simple tasks with hyperparameter transfer.
Scaling Diffusion Transformers up to 18B Efficiently via $\mu$P