Full Professor, Harbin Institute of Technology
1 paper at NeurIPS 2025
We "clone" large LLMs into small SLMs by training only low-rank projection matrices for weights and making all student activations identical to the teacher's. This yields comparable SLM performance with 1000x fewer training tokens.