1 paper across 1 session
We design a PAC-learner for contextual combinatorial semi-bandits with sparse rewards, with a sample complexity bound that primarily scales with the sparsity parameter rather than the number of arms.