In this paper, we find the existence of critical features hidden in Deep NeuralNetworks (DNNs), which are imperceptible but can actually dominate the outputof DNNs. We call these features dominant patterns. As the name suggests, for a natural image, if we add the dominant pattern of a DNN to it, the output of this DNN is determined by the dominant pattern instead of the original image, i.e., DNN's prediction is the same with the dominant pattern's. We design an algorithm to find such patterns by pursuing the insensitivity in the feature space. A direct application of the dominant patterns is the Universal Adversarial Perturbations(UAPs). Numerical experiments show that the found dominant patterns defeat state-of-the-art UAP methods, especially in label-free settings. In addition, dominant patterns are proved to have the potential to attack downstream tasks in which DNNs share the same backbone. We claim that DNN-specific dominant patterns reveal some essential properties of a DNN and are of great importance for its feature analysis and robustness enhancement.
翻译:在本文中,我们发现深神经网(DNN)中隐藏的关键特征的存在,这些特征是无法察觉的,但实际上可以主宰DNN的输出。我们将这些特征称为主要模式。我们将这些特征称为主要模式。就自然图像而言,如果我们加上DNN的主导模式,DNN的输出是由主导模式而不是原始图像决定的,即DNN的预测与主导模式相同。我们设计了一种算法,通过在特征空间中追求敏感度来找到这种模式。主要模式的直接应用是通用反扰模式(UAPs) 。数字实验表明,发现的主要模式挫败了最先进的UAP方法,特别是在无标签环境中。此外,主导模式被证明有可能攻击DNN共享的下游任务。我们声称,DNN的特定主要模式揭示了DNN的一些基本特性,对于其特征分析和增强性能非常重要。