Researcher, d-Matrix
1 paper at NeurIPS 2025
Our work, Mustafar, unlocks 70% sparsity in KV cache pruning by leveraging unstructured sparsity pattern, supported by a custom attention kernel, and boosts the inference efficiency of LLMs.