Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints. We further define other variants of the attacker-classifier game that determine the range of the optimal loss more efficiently than the full-fledged hypergraph construction. Our evaluation shows, for the first time, an analysis of the gap to optimal robustness for classifiers in the multi-class setting on benchmark datasets.
翻译:确定特定数据分配威胁模式下最佳分类员的稳健性,并将其与最新培训方法所达到的分类员进行比较,因此,这是一个重要的诊断工具。在本文件中,我们发现在任何离散数据集中多级分类员的测试-攻击者面前,在损失方面可实现的信息-理论下限。我们提供了一个总框架,以寻找最佳的0-1损失,围绕根据数据和对抗制约构建冲突高压图。我们进一步定义攻击者分类器游戏的其他变种,这些变种比全面高压构造更高效地确定最佳损失范围。我们的评估首次展示了在基准数据集多级设置中对分类员的最佳稳健性差距的分析。