3 papers across 3 sessions
We propose a distillation framework for training language model anonymizers capable of effective anonymization via iterative self-refinement
We propose BREAD, a novel and effective variant of GRPO that bridges supervised learning and reinforcement learning by employing branch rollouts from expert traces.
A dataset of millions of diverse synthetic stories, leading to better small language models.