In guaranteeing that no adversarial examples exist within a bounded region, certification mechanisms play an important role in neural network robustness. Concerningly, this work demonstrates that the certification mechanisms themselves introduce a new, heretofore undiscovered attack surface, that can be exploited by attackers to construct smaller adversarial perturbations. While these attacks exist outside the certification region in no way invalidate certifications, minimising a perturbation's norm significantly increases the level of difficulty associated with attack detection. In comparison to baseline attacks, our new framework yields smaller perturbations more than twice as frequently as any other approach, resulting in an up to $34 \%$ reduction in the median perturbation norm. That this approach also requires $90 \%$ less computational time than approaches like PGD. That these reductions are possible suggests that exploiting this new attack vector would allow attackers to more frequently construct hard to detect adversarial attacks, by exploiting the very systems designed to defend deployed models.
翻译:在确保封闭区域内不存在对抗性实例方面,认证机制在神经网络稳健性方面起着重要作用;在这项工作中,认证机制本身就引入了一种新的、迄今尚未发现的进攻表面,攻击者可以利用这种表面来制造较小的对抗性扰动;虽然这些攻击存在于认证区域之外,绝不会使认证无效,但尽量减少扰动的规范会大大增加与攻击探测有关的困难程度;与基线攻击相比,我们的新框架的扰动频率比任何其他方法的频率要小一倍多,导致中位扰动规范减少34 美元;这种方法也比PGD这样的计算时间少90 美元。 这些削减可能表明,利用这种新的攻击矢量可使攻击者通过利用设计来保护部署模式的系统,更经常地制造难以发现对抗性攻击的难度。