Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression

Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu, Kazushi Kawamura, Masato Motomura

Institute of Science Tokyo· Waseda University

Quantization Compression Deep Neural Network

Abstract

This paper proposes a novel matrix quantization method, Binary Quadratic Quan-tization (BQQ). In contrast to conventional first-order quantization approaches— such as uniform quantization and binary coding quantization—that approximatereal-valued matrices via linear combinations of binary bases, BQQ leverages theexpressive power of binary quadratic expressions while maintaining an extremelycompact data format.

We validate our approach with two experiments: a matrix compression benchmark and post-training quantization (PTQ) on pretrained VisionTransformer-based models. Experimental results demonstrate that BQQ consistently achieves a superior trade-off between memory efficiency and reconstructionerror than conventional methods for compressing diverse matrix data. It alsodelivers strong PTQ performance, even though we neither target state-of-the-artPTQ accuracy under tight memory constraints nor rely on PTQ-specific binarymatrix optimization.

For example, our proposed method outperforms the state-of-the-art PTQ method by up to 2.2% and 59.1% on the ImageNet dataset under the calibration-based and data-free scenarios, respectively, with quantization equivalentto 2 bits. These findings highlight the surprising effectiveness of binary quadraticexpressions for efficient matrix approximation and neural network compression.