1 paper across 1 session
Through theoretical models and empirical testbeds, we characterize the algorithmic tradeoff between privileged expert distillation and RL, and better options for expert distillation.