Poster Session 5 · Friday, December 5, 2025 11:00 AM → 2:00 PM
#905
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
Abstract
This paper proposes a novel matrix quantization method, Binary Quadratic Quan-tization (BQQ). In contrast to conventional first-order quantization approaches— such as uniform quantization and binary coding quantization—that approximatereal-valued matrices via linear combinations of binary bases, BQQ leverages theexpressive power of binary quadratic expressions while maintaining an extremelycompact data format.
We validate our approach with two experiments: a matrix compression benchmark and post-training quantization (PTQ) on pretrained VisionTransformer-based models. Experimental results demonstrate that BQQ consistently achieves a superior trade-off between memory efficiency and reconstructionerror than conventional methods for compressing diverse matrix data. It alsodelivers strong PTQ performance, even though we neither target state-of-the-artPTQ accuracy under tight memory constraints nor rely on PTQ-specific binarymatrix optimization.
For example, our proposed method outperforms the state-of-the-art PTQ method by up to 2.2% and 59.1% on the ImageNet dataset under the calibration-based and data-free scenarios, respectively, with quantization equivalentto 2 bits. These findings highlight the surprising effectiveness of binary quadraticexpressions for efficient matrix approximation and neural network compression.