1 paper across 1 session
This paper demonstrates that low precision causes non-reproducible LLM inference across different setups, proposing a hybrid-precision method, LayerCast, that computes in FP32 to achieve determinism while saving memory.