5 papers across 3 sessions
This work presents planning and learning algorithms for average-cost MDPs with dynamic risk measures.
Actor-free Q-learning in Continuous Action Spaces by Learning a "Wire-fitted Q-function"