Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious perturbations to natural inputs. These altered inputs are known in the literature as adversarial examples. In this paper, we propose a novel probabilistic framework to generalize and extend adversarial attacks in order to produce a desired probability distribution for the classes when we apply the attack method to a large number of inputs. This novel attack strategy provides the attacker with greater control over the target model, and increases the complexity of detecting that the model is being systematically attacked. We introduce four different strategies to efficiently generate such attacks, and illustrate our approach by extending multiple adversarial attack algorithms. We also experimentally validate our approach for the spoken command classification task, an exemplary machine learning problem in the audio domain. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate and by injecting imperceptible perturbations to the inputs.
翻译:尽管在广泛的人工智能任务中深层学习模式的显著表现和普及程度很高,但事实证明,这些模式很容易被自然投入的不可察觉但恶意扰动所蒙骗。文献中将这些改变的投入称为对抗性实例。在本文中,我们提出了一个新颖的概率框架,以推广和扩展对抗性攻击,从而在对大量投入应用攻击方法时为各个类别产生理想的概率分布。这种新颖的攻击战略为攻击者提供了对目标模型的更大控制,并增加了发现该模型正在系统攻击的复杂性。我们引入了四种不同的战略来有效制造这种攻击,并通过扩大多重对抗性攻击算法来说明我们的方法。我们还实验性地验证了我们用于口头指挥分类的任务的方法,这是音域中的一个示范性机器学习问题。我们的结果表明,我们可以在保持高愚弄率和对投入注射不易感触觉的同时,近似各类别的任何概率分布。