3 papers across 1 session
We propose a bilevel framework that jointly learns layer-wise sparsity and low-rank structure via truncated Gaussian sampling and efficient matrix approximation, achieving better performance for compressing LLMs.