The query-based black-box attacks, which don't require any knowledge about the attacked models and datasets, have raised serious threats to machine learning models in many real applications. In this work, we study a simple but promising defense technique, dubbed Random Noise Defense (RND) against query-based black-box attacks, which adds proper Gaussian noise to each query. It is lightweight and can be directly combined with any off-the-shelf models and other defense strategies. However, the theoretical guarantee of random noise defense is missing, and the actual effectiveness of this defense is not yet fully understood. In this work, we present solid theoretical analyses to demonstrate that the defense effect of RND against the query-based black-box attack and the corresponding adaptive attack heavily depends on the magnitude ratio between the random noise added by the defender (i.e., RND) and the random noise added by the attacker for gradient estimation. Extensive experiments on CIFAR-10 and ImageNet verify our theoretical studies. Based on RND, we also propose a stronger defense method that combines RND with Gaussian augmentation training (RND-GT) and achieves better defense performance.
翻译:以查询为基础的黑箱攻击并不要求了解被攻击模型和数据集,但对机器学习模型在许多实际应用中提出了严重威胁。在这项工作中,我们研究了一种简单但有希望的防御技术,即所谓的随机噪音防御(RND),以对抗基于询问的黑箱攻击,这为每个查询增加了适当的高斯语噪音。它很轻,可以直接与任何现成模型和其他防御战略相结合。然而,随机噪音防御的理论保障缺失,而且这一防御的实际有效性尚未完全被理解。在这项工作中,我们提出了坚实的理论分析,以证明RND对基于查询的黑箱攻击和相应的适应性攻击的防御效果在很大程度上取决于防御者(即RND)添加的随机噪音与攻击者添加的随机噪音之间的强度比。关于CRFAR-10和图像网络的大规模实验证实了我们的理论研究。基于RND,我们还提出一种更强大的防御方法,将RND与GOUS的增强能力培训结合起来,并实现更好的防御性。