Assistant Professor, University of Minnesota - Twin Cities
3 papers at NeurIPS 2025
This paper demonstrates that low precision causes non-reproducible LLM inference across different setups, proposing a hybrid-precision method, LayerCast, that computes in FP32 to achieve determinism while saving memory.
uncovering the Role of Long-Context Ability in Reasoning Training
We introduce a new method for selecting subspaces in low-rank optimization for memory-efficient pretraining of large language models (LLMs).