1 paper across 1 session
We introduce an RL algorithm leveraging reparameterization and distance-based diversity regularization to train intractable multimodal policies for diversity-critical tasks.