One popular group of defense techniques against adversarial attacks is based on injecting stochastic noise into the network. The main source of robustness of such stochastic defenses however is often due to the obfuscation of the gradients, offering a false sense of security. Since most of the popular adversarial attacks are optimization-based, obfuscated gradients reduce their attacking ability, while the model is still susceptible to stronger or specifically tailored adversarial attacks. Recently, five characteristics have been identified, which are commonly observed when the improvement in robustness is mainly caused by gradient obfuscation. It has since become a trend to use these five characteristics as a sufficient test, to determine whether or not gradient obfuscation is the main source of robustness. However, these characteristics do not perfectly characterize all existing cases of gradient obfuscation, and therefore can not serve as a basis for a conclusive test. In this work, we present a counterexample, showing this test is not sufficient for concluding that gradient obfuscation is not the main cause of improvements in robustness.
翻译:对付对抗性攻击的一种流行防御技术是建立在向网络注入随机噪音的基础上的,但这种随机防御的主要根源往往是由于梯度模糊,从而产生一种虚假的安全感。由于大多数流行的对抗性攻击是以优化为基础的,模糊的梯度降低了其攻击能力,而模型仍然容易发生更强或具体化的对抗性攻击。最近,确定了五个特征,当强度改善主要是由梯度模糊造成的时,通常观察到了这些特征。从那时以来,利用这五个特征作为充分测试的趋势,以确定梯度模糊是否是强性的主要来源。然而,这些特征并不完全说明所有现有的梯度模糊性案例,因此不能作为决定性测试的基础。在这项工作中,我们提出了一个反实例,表明这种测试不足以得出梯度模糊性不是强性改进的主要原因。