As the name suggests, image spam is spam email that has been embedded in an image. Image spam was developed in an effort to evade text-based filters. Modern deep learning-based classifiers perform well in detecting typical image spam that is seen in the wild. In this chapter, we evaluate numerous adversarial techniques for the purpose of attacking deep learning-based image spam classifiers. Of the techniques tested, we find that universal perturbation performs best. Using universal adversarial perturbations, we propose and analyze a new transformation-based adversarial attack that enables us to create tailored "natural perturbations" in image spam. The resulting spam images benefit from both the presence of concentrated natural features and a universal adversarial perturbation. We show that the proposed technique outperforms existing adversarial attacks in terms of accuracy reduction, computation time per example, and perturbation distance. We apply our technique to create a dataset of adversarial spam images, which can serve as a challenge dataset for future research in image spam detection.
翻译:正如其名称所示,图像垃圾邮件是嵌入图像中的垃圾邮件。图像垃圾邮件是为躲避基于文本的过滤器而开发的。现代深层次的基于学习的分类器在探测野生的典型图像垃圾邮件方面表现良好。在本章中,我们评估了许多用于攻击基于深层次学习的图像垃圾邮件分类器的对抗技术。在所测试的技术中,我们发现通用扰动效果最佳。我们使用通用的对称扰动,提出并分析一种新的基于变换的对称攻击,使我们能够在图像垃圾邮件中制作量身定制的“自然扰动 ” 。由此产生的垃圾邮件图像既得益于集中的自然特征的存在,也得益于通用的对称扰动干扰。我们显示,拟议的技术在精确度降低、计算每例时间和扰动距离方面优于现有的对抗性攻击。我们运用了我们的技术来创建一个对抗性垃圾邮件图像数据集,这可以作为未来图像垃圾邮件探测研究的挑战数据集。