In this paper we provide an approach for deep learning that protects against adversarial examples in image classification-type networks. The approach relies on two mechanisms:1) a mechanism that increases robustness at the expense of accuracy, and, 2) a mechanism that improves accuracy but does not always increase robustness. We show that an approach combining the two mechanisms can provide protection against adversarial examples while retaining accuracy. We formulate potential attacks on our approach with experimental results to demonstrate its effectiveness. We also provide a robustness guarantee for our approach along with an interpretation for the guarantee.
翻译:在本文中,我们为深入学习提供了一种方法,保护图像分类类型网络中的对抗性实例,该方法依靠两种机制:(1) 一种机制,以牺牲准确性为代价增强稳健性;和(2) 一种机制,提高准确性,但不总是提高稳健性;我们表明,将两种机制结合起来,可以提供保护,防止对抗性实例,同时保留准确性;我们以实验性结果对我们的方法进行潜在攻击,以证明其有效性;我们还为我们的方法提供了稳健性保障,同时对保障作出解释。