In this work, we consider a binary classification problem and cast it into a binary hypothesis testing framework, where the observations can be perturbed by an adversary. To improve the adversarial robustness of a classifier, we include an abstain option, where the classifier abstains from making a decision when it has low confidence about the prediction. We propose metrics to quantify the nominal performance of a classifier with an abstain option and its robustness against adversarial perturbations. We show that there exist a tradeoff between the two metrics regardless of what method is used to choose the abstain region. Our results imply that the robustness of a classifier with an abstain option can only be improved at the expense of its nominal performance. Further, we provide necessary conditions to design the abstain region for a 1- dimensional binary classification problem. We validate our theoretical results on the MNIST dataset, where we numerically show that the tradeoff between performance and robustness also exist for the general multi-class classification problems.
翻译:在这项工作中,我们考虑二进制分类问题,并将其丢入二进制假设测试框架,使观察结果可以被对手干扰。为了提高分类者的对抗性强度,我们包括一个弃权选项,即分类者在对预测缺乏信心时不作决定。我们提出了量化分类者的名义性能、不作选择和对对抗性扰动的稳健度的衡量标准。我们表明,两种衡量标准之间存在一种权衡,而无论采用何种方法选择弃权区域。我们的结果表明,只有牺牲其名义性能才能提高一个弃权选择的分类者的稳健性。此外,我们为设计一个一维二进制分类问题提供了必要的条件。我们在MNIST数据集上验证了我们的理论结果,我们从数字上表明,在一般多级分类问题上,业绩与稳健性之间的权衡也是存在的。