PhD student, The Hong Kong University of Science and Technology
1 paper at NeurIPS 2025
We introduce an RL algorithm leveraging reparameterization and distance-based diversity regularization to train intractable multimodal policies for diversity-critical tasks.