We formally define a feature-space attack where the adversary can perturb datapoints by arbitrary amounts but in restricted directions. By restricting the attack to a small random subspace, our model provides a clean abstraction for non-Lipschitz networks which map small input movements to large feature movements. We prove that classifiers with the ability to abstain are provably more powerful than those that cannot in this setting. Specifically, we show that no matter how well-behaved the natural data is, any classifier that cannot abstain will be defeated by such an adversary. However, by allowing abstention, we give a parameterized algorithm with provably good performance against such an adversary when classes are reasonably well-separated in feature space and the dimension of the feature space is high. We further use a data-driven method to set our algorithm parameters to optimize over the accuracy vs. abstention trade-off with strong theoretical guarantees. Our theory has direct applications to the technique of contrastive learning, where we empirically demonstrate the ability of our algorithms to obtain high robust accuracy with only small amounts of abstention in both supervised and self-supervised settings. Our results provide a first formal abstention-based gap, and a first provable optimization for the induced trade-off in an adversarial defense setting.
翻译:我们正式定义了地物空间攻击,使对手可以任意任意干扰数据点,但有限制地干扰数据点。通过将攻击限制在小随机子空间,我们的模型为非Lipschitz网络提供了一个干净的抽象,这些网络将小输入移动到大型特性运动中。我们证明,能够弃权的分类比在这种环境下无法避免交易的分类更强大。具体地说,我们表明,无论自然数据如何妥善掌握,任何不能弃权的分类者都会被这样的对手击败。然而,通过允许弃权,我们给这种对手提供了一种参数化的算法,在功能空间中,等级相当分离,而且特性空间的尺寸很高,因此,我们给它们提供了一种清晰的参数。我们进一步使用一种数据驱动的方法来设定我们的算法参数,以优化准确度和避免交易,同时提供强有力的理论保证。我们的理论直接应用了对比性学习技术,我们从经验上证明我们的算法能够获得高度稳健的精确度,在受监管和自我监督的环境下,只有少量的弃权率。我们的结果为在首次进行正常的防御时,提供了一种基于正式的防御的防御的极限的极限。