Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff.
翻译:反向培训是提高模型对对抗性扰动的稳健性的最有效技术之一。 但是,这一方法对模型的全面效果并没有得到很好理解。 例如, 对抗性培训可以降低对抗性风险( 对手的防偏差), 有时会增加标准风险( 在没有对手时一般化错误 ) 。 更严重的是, 这种行为受到学习问题各种要素的影响, 包括培训数据的规模和质量、 投入中具体形式的对抗性交易干扰、 模型的偏差度, 以及对手的力量等。 在本文中, 我们注重于 \ emph{ 分布式对立性 } 对抗性培训培训可以降低对抗性风险风险( 对手对对手的防偏差), 而对抗性培训性培训性培训性数据分布之间的距离有时会增加标准风险。 我们研究标准风险与对抗性交易性风险之间的权衡, 并得出最佳交易性交易性交易性交易性交易性交易性交易性的具体类别, 以及对手的实力等同性数据定义。 我们认为, 类内分级( ) 分级和等值的分级 分级, 分级 分级 分级 。 分级 分级 分级 分级 。 分级 分级 分级 分级 分级 分级 分级 分级 分级 分级 分级 分级 。