Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations, with the cost of nominal performance drop. While adversarial training appears to enhance the robustness and safety of a deep model deployed in open-world decision-critical applications, counterintuitively, it induces undesired behaviors in robot learning settings. In this paper, we show theoretically and experimentally that neural controllers obtained via adversarial training are subjected to three types of defects, namely transient, systematic, and conditional errors. We first generalize adversarial training to a safety-domain optimization scheme allowing for more generic specifications. We then prove that such a learning process tends to cause certain error profiles. We support our theoretical results by a thorough experimental safety analysis in a robot-learning task. Our results suggest that adversarial training is not yet ready for robot learning.
翻译:对抗性培训是培养深层次学习模式的有效方法,这些模式能够适应受规范的扰动,其代价是名义性性性能下降。虽然对抗性培训似乎加强了在开放世界决策关键应用程序中部署的深层模型的稳健性和安全性,但反直觉地引起了机器人学习环境中不可取的行为。在本文中,我们从理论上和实验上表明,通过对抗性培训获得的神经控制器存在三种缺陷,即瞬时、系统和有条件的错误。我们首先将对抗性培训普遍化为安全性磁性优化计划,允许采用更通用的规格。我们随后证明,这种学习过程往往会造成某些错误特征。我们通过在机器人学习任务中进行彻底的实验性安全分析来支持我们的理论结果。我们的结果表明,对抗性培训还没有准备好用于机器人学习。