Researcher, Microsoft
1 paper at NeurIPS 2025
We introduce a new design principle for LLM matrix optimizers - gradient multi-normalization, unifying previous work, and enabling faster and memory-efficient training of LLMs.