2 papers across 2 sessions
A systematic multimodal RL framework that improves the policy exploration and advantage estimation.