Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.
翻译:对抗性培训是培训对抗性强的模型的标准技术。 在本文中,我们研究对抗性培训,作为2位玩家零和游戏中交替的最佳应对策略。我们证明,即使在线性分类器和统计模型的简单假设中,这种游戏的交替最佳应对策略可能无法汇合。另一方面,游戏中独有的纯纳什平衡存在,而且相当稳健。我们支持实验的理论结果,显示对抗性培训的非趋同性和纳什平衡的稳健性。</s>