Stochastic linear contextual bandit algorithms have substantial applications in practice, such as recommender systems, online advertising, clinical trials, etc. Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks and can fail completely in the presence of attacks. Existing robust bandit algorithms only work for the non-contextual setting under the attack of rewards and cannot improve the robustness in the general and popular contextual bandit environment. In addition, none of the existing methods can defend against attacked context. In this work, we provide the first robust bandit algorithm for stochastic linear contextual bandit setting under a fully adaptive and omniscient attack. Our algorithm not only works under the attack of rewards, but also under attacked context. Moreover, it does not need any information about the attack budget or the particular form of the attack. We provide theoretical guarantees for our proposed algorithm and show by extensive experiments that our proposed algorithm significantly improves the robustness against various kinds of popular attacks.
翻译:近期的工程显示,最佳的土匪算法很容易受到对抗性攻击,而且在发生攻击时可能完全失败。 现有的强健的土匪算法只为非理论性环境在受奖赏攻击下发挥作用,不能提高一般和流行背景土匪环境中的稳健性。 此外,现有的方法都无法抵御受到攻击的背景。 在这项工作中,我们提供了第一个强健的土匪算法,用于在完全适应性和无所不知的攻击下进行随机性线性线性背景土匪设置。我们的算法不仅在受奖赏的攻击下运作,而且还在受攻击的背景下运作。此外,它不需要关于攻击预算或攻击特定形式的任何信息。我们为拟议的算法提供理论保证,并通过广泛的实验表明,我们提议的算法大大改进了对付各种民众攻击的稳健性。