MS student, Tianjin University
2 papers at NeurIPS 2025
We introduce an RL algorithm leveraging reparameterization and distance-based diversity regularization to train intractable multimodal policies for diversity-critical tasks.
Measuring neuronal activity via activations is ineffective in complex agents, as these values do not reflect true learning capacity. We introduce GraMa, which offers robust quantification and resetting guidance across various network architectures.