Adversarial robustness corresponds to the susceptibility of deep neural networks to imperceptible perturbations made at test time. In the context of image tasks, many algorithms have been proposed to make neural networks robust to adversarial perturbations made to the input pixels. These perturbations are typically measured in an $\ell_p$ norm. However, robustness often holds only for the specific attack used for training. In this work we extend the above setting to consider the problem of training of deep neural networks that can be made simultaneously robust to perturbations applied in multiple natural representation spaces. For the case of image data, examples include the standard pixel representation as well as the representation in the discrete cosine transform~(DCT) basis. We design a theoretically sound algorithm with formal guarantees for the above problem. Furthermore, our guarantees also hold when the goal is to require robustness with respect to multiple $\ell_p$ norm based attacks. We then derive an efficient practical implementation and demonstrate the effectiveness of our approach on standard datasets for image classification.
翻译:在图像任务方面,许多算法建议使神经网络变得强大,以对输入像素进行对抗性扰动。这些扰动通常用$\ ell_ p美元标准来衡量。然而,强性通常只对用于培训的具体攻击适用。在这项工作中,我们扩大上述设置,以考虑深神经网络的培训问题,这种培训可以同时对多个自然代表空间的扰动进行强力。在图像数据方面,举例而言,标准像素表示法以及离散 Cosine 变形~ (DCT) 基础的表示法。我们设计一种具有上述问题正式保证的理论合理算法。此外,在目标要求强于基于多美元\ ell_ p美元标准的攻击时,我们的保证也有效实际实施,并展示我们在图像分类标准数据集方面的做法的有效性。