Reward-oriented Causal Representation Learning

causal representation learning causal bandit reward-oriented

Abstract

Causal representation learning (CRL) is the process of disentangling the latent low-dimensional causally-related generating factors underlying high-dimensional observable data. Extensive recent studies have characterized CRL identifiability and perfect recovery of the latent variables and their attendant causal graph.

This paper introduces the notion of reward-oriented CRL, the purpose of which is to move away from perfectly learning the latent representation and instead learning it to the extent needed for optimizing a desired downstream task (reward). In reward-oriented CRL, perfectly learning the latent representation can be excessive; instead, it must be learned at the coarsest level sufficient for optimizing the desired task. Reward-oriented CRL is formalized as the optimization of a desired function of the observable data over the space of all possible interventions and focuses on linear causal and transformation models.

To sequentially identify the optimal subset of interventions, an adaptive exploration algorithm is designed that learns the latent causal graph and the variables needed to identify the best intervention. It is shown that for an

n

-dimensional latent space and a

d

-dimensional observation space, over a horizon

T

the algorithm's regret scales as

\tilde{O} (d^{\frac{1}{3}} n^{\frac{1}{3}} u^{\frac{2}{3}} T^{\frac{2}{3}} + u T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)

, where

u

measures total uncertainty in the graph estimates. Furthermore, an almost-matching lower bound is shown to scale as

Ω (d^{\frac{1}{3}} n^{\frac{1}{3}} p^{\frac{2}{3}} T^{\frac{2}{3}} + p T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)

, in which

u

is replaced by

p

that counts the number of causal paths in the graph.