Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.
翻译:对抗性培训是针对对抗性攻击最流行的培训方法之一,但是,从理论角度看,这种培训并不是最普遍的方法。 我们证明和证明存在、规律性以及对抗性替代风险的微小理论。 我们的结果解释了以往工作中关于对抗性强力的一些经验性观察,并提出了演算法发展的新方向。 此外,我们的结果扩大了先前已知的存在和对对抗性分类风险的微小理论,以替代风险。