1 paper across 1 session
Neuron Chunking is a hardware-aware sparsification framework that abstracts access patterns into contiguity distributions to couple neuron selection with flash I/O behavior and improve I/O efficiency in VLM inference.