Full Professor, Institute of Science and Technology
5 papers at NeurIPS 2025
We propose a parallel generation method for LLMs, where multiple instances synchronize through a shared, dynamically-updated attention cache
Influence Distillation is a mathematically justified data selection method for LLM fine-tuning that assigns optimal weights to training samples, achieving performance on par with or better than state-of-the-art while being substantially faster.
We provide a method for accurate end-to-end FP4 training of Large Language Models.
A low-precision scheme for fine-tuning LLMs
We investigate new scaling laws which predict the scaling of LLMs when training them over quantized or sparse representations.