1 paper across 1 session
We propose a framework in which a meta-controller learns to coordinate offline learning in 'sleep' phases to maximise reward in an 'awake' phase, choosing between different actions which correspond to types of offline process in the brain.