Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.
翻译:对手训练是最受欢迎的训练方法之一,可用于训练对对手攻击具有鲁棒性的模型,然而从理论上讲它还不够成熟。我们证明了针对对手感知替代风险应用的存在性定理、正则性定理以及极小化定理。我们的结果解释了先前研究中的一些对于对手攻击鲁棒性的经验观察,并提出了算法开发方向。此外,我们的结果将先前知道的对手分类风险的存在性和极小化定理推广到替代风险。