Modern neural networks are able to perform at least as well as humans in numerous tasks involving object classification and image generation. However, small perturbations which are imperceptible to humans may significantly degrade the performance of well-trained deep neural networks. We provide a Distributionally Robust Optimization (DRO) framework which integrates human-based image quality assessment methods to design optimal attacks that are imperceptible to humans but significantly damaging to deep neural networks. Through extensive experiments, we show that our attack algorithm generates better-quality (less perceptible to humans) attacks than other state-of-the-art human imperceptible attack methods. Moreover, we demonstrate that DRO training using our optimally designed human imperceptible attacks can improve group fairness in image classification. Towards the end, we provide an algorithmic implementation to speed up DRO training significantly, which could be of independent interest.
翻译:现代神经网络至少能够与人类一起完成许多涉及物体分类和图像生成的任务。然而,人类无法察觉的小扰动可能会大大降低训练有素的深神经网络的性能。我们提供了一个分布式强力优化(DRO)框架,将基于人类的图像质量评估方法整合在一起,设计对人来说无法察觉但对深神经网络造成重大破坏的最佳攻击。通过广泛的实验,我们证明我们的攻击算法比其他最先进的人类不易察觉的攻击方法更高质量(对人来说是看不见的)。此外,我们证明使用我们最优化设计的人类不易察觉的攻击进行DRO培训可以提高图像分类的公平性。到最后,我们提供了一种算法实施,以大大加快DRO培训,这可能会引起独立的兴趣。