Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance {\epsilon} (in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance {\epsilon}/2. Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated. To illustrate, we revisit 13 detector defenses. For 11/13 cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.
翻译:很难让分类者对对抗性实例更强大。 因此, 许多防守人员要完成似乎比较容易的任务, 即探测被扰动的投入。 我们为这个目标展示了一个障碍。 我们证明在检测和分类对抗性实例之间普遍降低了难度: 如果对远程袭击有一个强大的检测器( 在某些指标中 ), 我们可以为距离袭击建立一个类似的强势( 但效率不高 ) 分类器。 我们的减少在计算上是无效的, 因而无法用来构建实用的分类器。 相反, 这是一种有用的理智检查, 检验实验性检测结果是否意味着比作者预期的要强得多的东西。 为了说明, 我们重新审视13个检测器的防御。 对于11/13个案例, 我们显示, 声称的检测结果意味着一个效率不高的分类器, 其强度远远超出最新技术。