Reinforcement learning (RL) has advanced greatly in the past few years with the employment of effective deep neural networks (DNNs) on the policy networks. With the great effectiveness came serious vulnerability issues with DNNs that small adversarial perturbations on the input can change the output of the network. Several works have pointed out that learned agents with a DNN policy network can be manipulated against achieving the original task through a sequence of small perturbations on the input states. In this paper, we demonstrate furthermore that it is also possible to impose an arbitrary adversarial reward on the victim policy network through a sequence of attacks. Our method involves the latest adversarial attack technique, Adversarial Transformer Network (ATN), that learns to generate the attack and is easy to integrate into the policy network. As a result of our attack, the victim agent is misguided to optimise for the adversarial reward over time. Our results expose serious security threats for RL applications in safety-critical systems including drones, medical analysis, and self-driving cars.
翻译:过去几年来,随着政策网络中有效的深神经网络的使用,强化学习(RL)取得了很大进展。随着巨大的成效,DNN公司出现了严重的弱点问题,对输入的小规模对抗性干扰可以改变网络的输出。一些著作指出,DNN政策网络的学习人员可以通过对输入国进行一系列小的干扰来操纵实现最初的任务。在本文中,我们进一步表明,也可以通过一系列攻击对受害者政策网络实施任意的对抗性奖赏。我们的方法涉及最新的对抗性攻击技术,即Aversarial变换网络(ATN),它学会发动攻击,容易融入政策网络。由于我们的攻击,受害者代理人被误导到在一段时间里选择对输入国的对抗性奖励。我们的结果暴露了在安全临界系统中对立应用RL的严重的安全威胁,包括无人驾驶飞机、医疗分析和自行驾驶汽车。