We study adversarial attacks on linear stochastic bandits, a sequential decision making problem with many important applications in recommender systems, online advertising, medical treatment, and etc. By manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this observation, this paper studies the attackability of a $k$-armed linear bandit environment. We first provide a full necessity and sufficiency characterization of attackability based on the geometry of the context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the current environment is attackable, and if Yes, modifies the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed method.
翻译:我们研究对线性随机强盗的对抗性攻击,这是在建议系统、在线广告、医疗等许多重要应用中相继决策的问题。 通过操纵奖赏,对手的目的是控制土匪算法的行为。也许令人惊讶的是,我们首先显示某些攻击目标永远无法实现。这与没有背景的随机强盗形成鲜明对比,其内在原因是线性随机强盗中的武器相互关联。受此观察的驱使,本文研究的是美元武装线性强盗环境的可攻击性。我们首先根据上下文矢量的几何方法,对攻击性作出完全必要和充分的定性。我们随后提出了针对LinUCB和Robust 阶段消除的两阶段攻击方法。该方法首先说明目前的环境是否可攻击,如果是的话,则改变奖励,以只使用亚线性成本来迫使算法拉动目标直线时间。数字实验进一步证实拟议方法的有效性和成本效益。