Neural Processing Research Center

🏛 Neural Processing Research Center

2 papers across 1 session

Poster Session 6

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

We develop Q-Palette, a quantizer suite with efficient inference CUDA kernels and wide fractional-bit support, enabling mixed-scheme quantization that achieves ~36% faster LLM decoding than NormalFloat while improving accuracy.

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

#3513 · Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song

We propose a novel query-agnostic KV cache eviction method for multi-query scenario.