We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon $T$) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order $\mathcal{O}(\log(T)/\Delta^2)$, where $\Delta$ is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first $\mathcal{O}(\log(T)/\Delta)$ approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis. We thus term this condition REDUCE2EXACT, and observe that it is satisfied in many concrete examples. Moreover, it can be extended to the probabilistically triggered arms setting, thus capturing even more problems, such as online influence maximization.
翻译:在近似遗憾的背景下,我们对组合式多武装土匪问题的综合汤普森抽样政策(CTS)进行了研究。虽然CTS吸引了许多人的兴趣,但其他通常的CMAB政策在考虑非实际的神器时却有缺陷:对于某些神器来说,CTS的近似遗憾(用时间范围缩小线性差距)[Wang和Chen,2018年]。然后,必须进行一项研究,以区分CTS可以学习的神器。这项研究甚至由Kong等人开始(2021年):它们首次对CTS为贪婪的神器进行了近似遗憾分析,获得了最高订单 $\mathcal{O}(log\\\\\\\\\\\\Delta%2) $的上限:对于某些神器来说,Celtaota$是微不足道的奖赏差距。在本文中,我们的目标是进一步推进这项研究,我们提供了第一个 $mathcal {O}甚至由Kong elta) exalalalimalimalal exal deal deal deal deal deal deal dealization ex decal deal deal decal deal deal deal deal deal deal decil decilate ex ex ex ex ex rofact ex ex ex ex ex ex ex ro ro ex ex ex ex laxiltiquest ex ex ro decil (我们,这样,这样可以使CUtracil decil decil decilate decil ex ex ex ex ex ex ex ex ex ex ex ex ex ex de de de de de de de de ex de ex ex ex ex ex ex ex ex ex ex de ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex 下,我们 ex