1 paper across 1 session
This paper presents LittleBit, a novel framework that combines latent matrix factorization and a multi-scale compensation mechanism to compress Large Language Models (LLMs) to ultra-low bit levels.