PhD student, Peking University
1 paper at NeurIPS 2025
A choice-theoretic loss that distills knowledge by teaching the student the teacher’s class ranking under the Plackett–Luce model, keeping the true label as the top choice and unifying distillation without an additional cross-entropy term.