2 papers across 2 sessions
Scalable, simple, and practical algorithm for model-based RL with regret bounds across several RL settings and experiments on state-based, visual control and hardware tasks.
We identify the connection between the output of the encoder and the ensuing dense layers as the main underlying factor limiting scaling capabilities in deep RL