In this paper, we propose a new data poisoning attack and apply it to deep reinforcement learning agents. Our attack centers on what we call in-distribution triggers, which are triggers native to the data distributions the model will be trained on and deployed in. We outline a simple procedure for embedding these, and other, triggers in deep reinforcement learning agents following a multi-task learning paradigm, and demonstrate in three common reinforcement learning environments. We believe that this work has important implications for the security of deep learning models.
翻译:在本文中,我们提出一个新的数据中毒袭击,并将其应用到深层强化学习机构。我们的攻击中心是所谓的分配触发器,这是数据分发的原生触发器。模型将受到培训并部署在其中。我们概述了一个简单的程序,用于按照多任务学习模式将这些触发器和其他触发器嵌入深强化学习机构,并在三个共同的强化学习环境中进行演示。我们认为,这项工作对深层学习模式的安全具有重要影响。