Postdoc, Carnegie Mellon University
2 papers at NeurIPS 2025
Antidistillation sampling strategically modifyies a model's next-token probability distribution to poison reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility.
We show that transformers achieve length generalization when training on shorter main task and longer auxiliary tasks together.