cost efficiency - NeurIPS 2025

today local_bar

cost efficiency

1 paper across 1 session

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

HiFC: High-efficiency Flash-based KV Cache Swapping for Scaling LLM Inference

#4204 · Inho Jeong, Sunghyeon Woo, Sol Namkung, Dongsuk Jeon

HiFC swaps LLM KV caches directly between GPU and pSLC-SSD, matching DRAM-level throughput while eliminating DRAM and slashing cost five-fold.