Adversarial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models. This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution, which admits different solutions for optimal standard and adversarial classifiers. In this work, we focus on the latter case under a binary Gaussian mixture classification problem. Unlike earlier work, we aim to derive the natural accuracy gap between the optimal Bayes and adversarial classifiers, and study the effect of different distributional parameters, namely separation between class centroids, class proportions, and the covariance matrix, on the derived gap. We show that under certain conditions, the natural error of the optimal adversarial classifier, as well as the gap, are locally minimized when classes are balanced, contradicting the performance of the Bayes classifier where perfect balance induces the worst accuracy. Moreover, we show that with an $\ell_\infty$ bounded perturbation and an adversarial budget of $\epsilon$, this gap is $\Theta(\epsilon^2)$ for the worst-case parameters, which for suitably small $\epsilon$ indicates the theoretical possibility of achieving robust classifiers with near-perfect accuracy, which is rarely reflected in practical algorithms.
翻译:与标准模型相比,Adversarial培训往往产生对自然(不受干扰)例子不准确的模型,这可归因于培训数据分布的算法缺陷或基本属性,即对最佳标准和对抗性分类方法有不同的解决办法。在这项工作中,我们在二进制高斯混合分类问题下注重后一种情况。与先前的工作不同,我们的目标是得出最佳贝亚和对抗性分类者之间的自然准确性差距,研究不同分配参数,即类类固醇、等级比例和变异矩阵对衍生差距的影响。我们表明,在某些条件下,最佳对抗性分类师的自然错误以及差距在当地最小化,当等级平衡时,与Bayes分类师的性能相矛盾,因为完美的平衡导致最差的准确性。此外,我们表明,如果以美元为约束的美元和以美元为约束的对抗性预算,这一差距是美元和变异矩阵对衍生差距的影响。 我们表明,在某些条件下,最佳的对抗性分类分析师的自然错误性错误性差,在最差的等级参数中,最接近于最差的数值。