Postdoc, University of Oxford
1 paper at NeurIPS 2025
We propose a framework in which a meta-controller learns to coordinate offline learning in 'sleep' phases to maximise reward in an 'awake' phase, choosing between different actions which correspond to types of offline process in the brain.