This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on \texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of order $\mathcal{O}(\sqrt{(K+\alpha(G)d)T\log{K}})$ over the time horizon $T$, where $\alpha(G)$ is the average \emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mathcal{O}(\sqrt{\alpha(G)dT\log{K}\log(KT)})$ for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.
翻译:本文研究对抗性图形背景土匪 { 对抗性多武装土匪的变体 { 对抗性多武装土匪的变体 { 调用两种最常见的侧面信息 :\ emph{ contexts} 和\ emph{ 侧面观察} 。 在此背景下, 学习代理商在用 $d$- 维的上下文矢量展示后, 反复从一套 $K 的行动中选择。 该代理商不仅引起并观察所选行动的丢失, 而且还观察观察结构中的相邻行动损失, 以一系列反馈图表编码编码 。 这设置了社交网络中的各种应用模型, 既可以提供上下文和图形结构的侧面观察 。 根据\ ttt{ EXP3} 来开发两种高效的算法。 在轻微条件下, 我们的分析显示, 第一个算法,\ pexttrtal{ kr> 的算法结果, 以 =xxxal=xxxxxx 的算法 。