Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.
翻译:通用对称扰动(UAPs)是一种输入扰动,它可以欺骗神经网络,用大量数据来欺骗神经网络。它们是一种严重威胁,因为它有助于对神经网络进行现实的、实际的和低成本的攻击。在这项工作中,我们根据数据依赖的Jacobian人的准则,为UAPs的效力设定了上限。我们从经验上证实,Jacobian的正规化在保持清洁性能的同时,大大加强了对UAPs的稳健性模式。我们的理论分析还使我们能够为对投入进行的共同对称干扰的强度制定衡量标准。我们用这一衡量标准来衡量数据集的基准,并表明它与实际观察到的稳健性高度相关。这表明,在不牺牲清洁性的情况下,可以可靠地减少现实和实用的普遍攻击,这显示了机器学习系统的稳健的前景。