This paper investigates a family of methods for defending against adversarial attacks that owe part of their success to creating a noisy, discontinuous, or otherwise rugged loss landscape that adversaries find difficult to navigate. A common, but not universal, way to achieve this effect is via the use of stochastic neural networks. We show that this is a form of gradient obfuscation, and propose a general extension to gradient-based adversaries based on the Weierstrass transform, which smooths the surface of the loss function and provides more reliable gradient estimates. We further show that the same principle can strengthen gradient-free adversaries. We demonstrate the efficacy of our loss-smoothing method against both stochastic and non-stochastic adversarial defences that exhibit robustness due to this type of obfuscation. Furthermore, we provide analysis of how it interacts with Expectation over Transformation; a popular gradient-sampling method currently used to attack stochastic defences.
翻译:本文调查了一套对抗性攻击的防御方法,这些攻击的成功部分在于制造出一种吵闹、不连续或剧烈的损失场景,对手们发现难以驾驭。一个共同但并非普遍的实现这一效果的方法就是使用随机神经网络。我们表明这是一种梯度模糊化的形式,并提议在威氏变形的基础上对基于梯度的对手进行普遍扩展,这种变形可以平滑损失功能的表面并提供更可靠的梯度估计。我们进一步表明,同样的原则可以加强无梯度的对手。我们展示了我们的损失移动方法对于由于这种模糊化而表现出强力的随机和非随机对抗性对抗性防御的功效。此外,我们分析了它如何与变革的预期发生互动;目前用来打击随机防御的流行的梯度抽样方法。