PhD student, Rice University
1 paper at NeurIPS 2025
This paper demonstrates that low precision causes non-reproducible LLM inference across different setups, proposing a hybrid-precision method, LayerCast, that computes in FP32 to achieve determinism while saving memory.