2 papers across 2 sessions
This paper presents LittleBit, a novel framework that combines latent matrix factorization and a multi-scale compensation mechanism to compress Large Language Models (LLMs) to ultra-low bit levels.
We unify the diverse attention pattern for visual generative models, and benefit the sparse and quantization