Deep learning image classifiers are known to be vulnerable to small adversarial perturbations of input images. In this paper, we derive the locally optimal generalized likelihood ratio test (LO-GLRT) based detector for detecting stochastic targeted universal adversarial perturbations (UAPs) of the classifier inputs. We also describe a supervised training method to learn the detector's parameters, and demonstrate better performance of the detector compared to other detection methods on several popular image classification datasets.
翻译:据了解,深层学习图像分类系统很容易受到输入图像的小阻力扰动。在本文中,我们得出了基于本地最佳通用概率比测试(LO-GLRT)的基于本地最佳通用概率比测试(LO-GLRT)的检测器,用于检测分类输入的随机性目标通用对称扰动(UAPs ) 。我们还描述了一种受监督的培训方法,用于学习探测器参数,并显示探测器与其他几个流行图像分类数据集的检测方法相比,其性能更好。