Many deep learning methods have successfully solved complex tasks in computer vision and speech recognition applications. Nonetheless, the robustness of these models has been found to be vulnerable to perturbed inputs or adversarial examples, which are imperceptible to the human eye, but lead the model to erroneous output decisions. In this study, we adapt and introduce two geometric metrics, density and coverage, and evaluate their use in detecting adversarial samples in batches of unseen data. We empirically study these metrics using MNIST and two real-world biomedical datasets from MedMNIST, subjected to two different adversarial attacks. Our experiments show promising results for both metrics to detect adversarial examples. We believe that his work can lay the ground for further study on these metrics' use in deployed machine learning systems to monitor for possible attacks by adversarial examples or related pathologies such as dataset shift.
翻译:许多深层次的学习方法成功地解决了计算机视觉和语音识别应用方面的复杂任务,然而,这些模型的坚固性被发现很容易受到干扰性投入或对抗性实例的影响,这些投入或对抗性实例是人类无法察觉的,但却导致模型得出错误的产出决定。在本研究中,我们调整和引入了两种几何计量、密度和覆盖面,并评价了它们在检测一组未见数据中的对立样本中的使用情况。我们用MNIST和MedMNIST的两个真实世界生物医学数据集对这两种数据进行了实证性研究,这两类数据都受到两种不同的对抗性攻击。我们的实验显示,两种衡量标准都能够发现对抗性实例的有希望的结果。我们认为,他的工作可以为进一步研究这些指标在已部署的机器学习系统中的使用奠定了基础,以便通过对抗性实例或诸如数据集转换等相关病理来监测可能的攻击。