2 papers across 1 session
We investigate the role of learning rate grafting and the staleness of the preconditioner in Shampoo by decoupling the updates of the eigenvalues and eigenbasis of its preconditioner.