It is well known that machine learning methods can be vulnerable to adversarially-chosen perturbations of their inputs. Despite significant progress in the area, foundational open problems remain. In this paper, we address several key questions. We derive exact and approximate Bayes-optimal robust classifiers for the important setting of two- and three-class Gaussian classification problems with arbitrary imbalance, for $\ell_2$ and $\ell_\infty$ adversaries. In contrast to classical Bayes-optimal classifiers, determining the optimal decisions here cannot be made pointwise and new theoretical approaches are needed. We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry, which, to our knowledge, have not yet been used in the area. Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced. We also show further results, including an analysis of classification calibration for convex losses in certain models, and finite sample rates for the robust risk.
翻译:众所周知,机器学习方法可能容易受到其投入的对抗性选择干扰。 尽管在这一领域取得了显著进展,但基础问题仍然存在。 在本文件中,我们处理几个关键问题。我们为分两种和三类高斯分类问题的重要背景,以任意的不平衡方式,从$@ell_2$和$\ell_ ⁇ infty$获得精确和接近最佳的分类。与古老的巴耶斯-最佳分类者相比,确定这里的最佳决定不能有分寸,需要新的理论方法。我们开发和利用新工具,包括最近从强固异性概率理论中取得的突破,而据我们所知,该地区尚未使用这些突破。我们的结果揭示了在数据不平衡时不断增长的标准和稳健准确性之间的基本权衡。我们还展示了进一步的结果,包括分析某些模型中的峰值损失的分类校准,以及稳健风险的有限抽样率。