BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Matthew Landers, Taylor W. Killian, Hugo Barnes, Thomas Hartvigsen, Afsaneh Doryab

Offline Reinforcement Learning Batch Reinforcement Learning Combinatorial Action Spaces Reinforcement Learning Discrete Action Spaces Large Action Spaces Sequential Decision Making

⋅ NeurIPS ⋅ Poster ⋅OpenReview

Abstract

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects.

We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to

20 \times

in environments with over four million actions.