1 paper across 1 session
We introduce CQN-AS, a value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.