Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker's targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.
翻译:多数强化学习算法暗含着强烈同步。 我们展示了针对Q-学习的新式袭击,这些袭击利用了这一假设带来的脆弱性,将奖赏信号延迟一段时间。 我们考虑了两类攻击目标:有针对性的袭击,目的是要了解目标政策,和非目标袭击,目的只是要促成低奖励的政策。我们通过一系列实验来评估拟议袭击的功效。我们的第一个观察是,在目标只是为了尽量减少奖赏时,奖赏拖延式袭击极为有效。事实上,我们发现即使天真的基线奖赏拖延式袭击在尽量减少奖赏方面也非常成功。 另一方面,有针对性的袭击更具挑战性,尽管我们仍然表明,拟议的方法对于实现攻击者目标仍然非常有效。此外,我们引入了第二个威胁模式,即采取最低限度的减缓措施,确保奖赏不会被按顺序使用。我们发现,这种缓解措施还不足以确保攻击的力度能够推迟,但能维持奖赏的秩序。