秘密袭击的可行性和不可避免性 (The Feasibility and Inevitability of Stealth Attacks)

We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a "democratization of AI" agenda, where network architectures and trained parameter sets are shared publicly. Building on work by [Tyukin et al., International Joint Conference on Neural Networks, 2020], we develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts in a realistic setting. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks.

翻译：我们开发和研究新的对抗性扰动,使攻击者能够控制一般人工智能系统(AI)的决策,包括深层学习神经网络。与对数据进行对抗性修改相比,我们在这里考虑的攻击机制涉及对AI系统本身的修改。这种隐形攻击可以由软件开发团队中一个恶意、腐败或不满的成员进行。也可以由那些希望利用“AI议程民主化”的人进行,其中网络架构和训练有素的参数组可以公开共享。在[Tyukin等人,神经网络国际联合会议,2020年]的工作基础上,我们制定了一系列新的可执行的攻击战略,并伴有分析,表明在极有可能的情况下,隐性攻击可以变得透明,因为系统性攻击者所不知道的固定验证集不会发生任何变化,同时将任何预期的产出用于触发兴趣投入。攻击者只需根据现实估计验证集的大小和AI相关隐性空间的扩展。在深度学习神经系统网络中,我们从一个深度的神经系统变压到一个神经系统变压的理论,我们从一个神经系统变压到一个神经系统变压的理论。

相关内容

Neural Networks

关注 1650

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/