Machine Learning is becoming a pivotal aspect of many systems today, offering newfound performance on classification and prediction tasks, but this rapid integration also comes with new unforeseen vulnerabilities. To harden these systems the ever-growing field of Adversarial Machine Learning has proposed new attack and defense mechanisms. However, a great asymmetry exists as these defensive methods can only provide security to certain models and lack scalability, computational efficiency, and practicality due to overly restrictive constraints. Moreover, newly introduced attacks can easily bypass defensive strategies by making subtle alterations. In this paper, we study an alternate approach inspired by honeypots to detect adversaries. Our approach yields learned models with an embedded watermark. When an adversary initiates an interaction with our model, attacks are encouraged to add this predetermined watermark stimulating detection of adversarial examples. We show that HoneyModels can reveal 69.5% of adversaries attempting to attack a Neural Network while preserving the original functionality of the model. HoneyModels offer an alternate direction to secure Machine Learning that slightly affects the accuracy while encouraging the creation of watermarked adversarial samples detectable by the HoneyModel but indistinguishable from others for the adversary.
翻译:今天,机器学习正在成为许多系统的关键方面,为分类和预测任务提供了新的业绩,但这种快速整合也带来了新的意外弱点。为使这些系统更加坚固,不断增长的自动机器学习领域提出了新的攻击和防御机制。然而,由于这些防御方法只能为某些模型提供安全,并且由于过于限制性的限制而缺乏可缩放性、计算效率和实用性,因此存在着巨大的不对称性。此外,新引入的攻击可以通过微妙的改变,轻易地绕过防御战略。在本文中,我们研究了由蜂蜜罐启发的替代方法,以探测对手。我们的方法产生一个嵌入的水印模型的学习模型。当敌人开始与我们的模型互动时,鼓励攻击者添加这一预先设定的水标记,刺激对敌对实例的探测。我们表明,蜂蜜模型可以揭示69.5%的对手试图攻击神经网络,同时保持模型的原有功能。蜜模型提供了一种替代方向,可以确保机器学习略微影响准确性,同时鼓励创建由蜂蜜模型探测但无法与对手分离的水标的对等样品。