1 paper across 1 session
We develop an optimizer ASGO that can provably exploit the low-rank gradients and block-wise diagonal Hessians in training.