1 paper across 1 session
A choice-theoretic loss that distills knowledge by teaching the student the teacher’s class ranking under the Plackett–Luce model, keeping the true label as the top choice and unifying distillation without an additional cross-entropy term.