6 papers across 2 sessions
We propose a computationally tractable multinomial logit contextual bandit algorithm, which is designed to handle generic non-linear parametric utility functions.
This study raises and addresses the problem of time-delayed feedback in learning in games.