2 papers across 2 sessions
We show that interval estimation based methods produce better distilled embedders in multi-teacher distillation settings compared to MSE or Cosine base methods.