Non-stationarity is ubiquitous in human behavior and addressing it in the contextual bandits is challenging. Several works have addressed the problem by investigating semi-parametric contextual bandits and warned that ignoring non-stationarity could harm performances. Another prevalent human behavior is social interaction which has become available in a form of a social network or graph structure. As a result, graph-based contextual bandits have received much attention. In this paper, we propose "SemiGraphTS," a novel contextual Thompson-sampling algorithm for a graph-based semi-parametric reward model. Our algorithm is the first to be proposed in this setting. We derive an upper bound of the cumulative regret that can be expressed as a multiple of a factor depending on the graph structure and the order for the semi-parametric model without a graph. We evaluate the proposed and existing algorithms via simulation and real data example.
翻译:在人类行为中,非常态现象无处不在,在背景强盗中解决这一问题具有挑战性。一些作品通过调查半参数背景强盗来解决这个问题,并警告说,忽视非常态强盗会损害表演。另一个普遍的人类行为是社交互动,这种互动以社交网络或图形结构的形式出现。因此,基于图形的背景强盗受到了很多关注。在本文中,我们为基于图表的半参数性强盗奖赏模式提出了一个新型的汤普森抽样算法。我们的算法是第一个在这个环境中提出的。我们从累积的遗憾中得出了一个上限,它可以代表一个多重因素,取决于图形结构以及没有图表的半参数模型的顺序。我们通过模拟和真实数据实例来评估拟议的和现有的算法。