PhD student, Yale University
1 paper at NeurIPS 2025
We propose DuoGPT, a training-free pruning framework that integrates activation sparsity into OBC framework to enable efficient dual-sparse LLM inference with state-of-the-art accuracy–efficiency trade-offs and scalability.