Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations based on training in the presence of these imperceptible perturbations (i.e. adversarial training). In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we follow two distinct parallel approaches to investigate the outcomes of adversarial training on deep neural policies based on worst-case distributional shift and feature sensitivity. For the first approach, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via experiments in the OpenAI Atari environments we show that minimal perturbations computed for adversarially trained policies are more focused on lower frequencies in the Fourier domain, indicating a higher sensitivity of these policies to low frequency perturbations. For the second approach, we propose a novel method to measure the feature sensitivities of deep neural policies and we compare these feature sensitivity differences in state-of-the-art adversarially trained deep neural policies and vanilla trained deep neural policies. We believe our results can be an initial step towards understanding the relationship between adversarial training and different notions of robustness for neural policies.
翻译:基于深心神经网络的强化学习政策很容易被察觉到,以神经网络图像分类的同样方式对其投入进行对抗性干扰,这与神经网络图像分类非常相似。最近的工作提出了若干方法,以提高深心强化学习代理人的稳健性,而以在有这些不可察觉的扰动的情况下进行培训为基础的对抗性扰动(即对抗性培训)。在本文中,我们研究了对抗性培训对代理人所学神经政策的影响。特别是,我们采用两种截然不同的平行方法,调查以最坏情况分布变化和特征敏感性为基础的深心神经政策对抗性培训的结果。关于第一种方法,我们比较了为对抗性训练的和Vanilla训练的神经政策所计算的四面加强性学习代理人的稳健性与对抗性扰动。在OpenAI Atari环境里进行的Via实验显示,为对抗性培训政策所计算的最小的扰动性,更侧重于Fourier领域较低的频率,表明这些政策对低频率的敏感度。关于最强烈的神经性政策的初步方法,我们建议一种新的方法,以测量经过训练的深度神经敏感性,我们所训练的深度神经性政策之间的本质敏感性。我们相信,在深度神经政策中可以比较经过深觉政策中经过训练的深度对立性政策之间的特征。