Robustness to adversarial examples of machine learning models remains an open topic of research. Attacks often succeed by repeatedly probing a fixed target model with adversarial examples purposely crafted to fool it. In this paper, we introduce Morphence, an approach that shifts the defense landscape by making a model a moving target against adversarial examples. By regularly moving the decision function of a model, Morphence makes it significantly challenging for repeated or correlated attacks to succeed. Morphence deploys a pool of models generated from a base model in a manner that introduces sufficient randomness when it responds to prediction queries. To ensure repeated or correlated attacks fail, the deployed pool of models automatically expires after a query budget is reached and the model pool is seamlessly replaced by a new model pool generated in advance. We evaluate Morphence on two benchmark image classification datasets (MNIST and CIFAR10) against five reference attacks (2 white-box and 3 black-box). In all cases, Morphence consistently outperforms the thus-far effective defense, adversarial training, even in the face of strong white-box attacks, while preserving accuracy on clean data.
翻译:机器学习模型对抗性实例的强健性仍然是一个开放的研究课题。 袭击往往通过反复用旨在愚弄它的对抗性实例来探寻固定目标模型而取得成功。 在本文中,我们引入了摩方,这种方法通过将模型作为移动目标来改变防御环境,与对抗性实例相对立。 通过定期移动模型的决策功能, 摩方使得反复或相关袭击的成功面临巨大的挑战。 摩方在回答预测询问时以引入足够随机性的方式部署了一组从基准模型生成的模型。 为确保反复或相关袭击失败,部署的模型集合在达到查询预算后自动失效,而模型库则由预先生成的新模型库无缝地取代。 我们对两个基准图像分类数据集( MNIST 和 CIFAR10) 进行了两次基准图像分类数据集( MNIST 和 CIFAR10 ) 的评估( ) 5 参考攻击(2 白箱 和 3 黑箱 ) 。 在所有案例中, 摩方都一贯地超越了如此有效的防御、对抗性训练, 即使在面临强烈的白箱攻击的情况下, 也保持了清洁数据的准确性数据 。