PhD student, Seoul National University
1 paper at NeurIPS 2025
We develop Q-Palette, a quantizer suite with efficient inference CUDA kernels and wide fractional-bit support, enabling mixed-scheme quantization that achieves ~36% faster LLM decoding than NormalFloat while improving accuracy.