Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.
翻译:强化学习的数据中毒历来侧重于总体性能退化,有针对性的袭击通过干扰成功,包括控制受害者的政策和奖赏。我们引入了隐性中毒袭击强化学习,只导致特定目标州发生行为不当的诱因 — — 所有这些都在不控制政策或奖赏的情况下对一小部分培训观察进行微小的修改。我们通过调整最新技术、梯度调整、强化学习来实现这一目标。我们测试了我们的方法,并在两场不同困难的阿塔里游戏中展示成功经验。