Together with impressive advances touching every aspect of our society, AI technology based on Deep Neural Networks (DNN) is bringing increasing security concerns. While attacks operating at test time have monopolised the initial attention of researchers, backdoor attacks, exploiting the possibility of corrupting DNN models by interfering with the training process, represents a further serious threat undermining the dependability of AI techniques. In a backdoor attack, the attacker corrupts the training data so to induce an erroneous behaviour at test time. Test time errors, however, are activated only in the presence of a triggering event corresponding to a properly crafted input sample. In this way, the corrupted network continues to work as expected for regular inputs, and the malicious behaviour occurs only when the attacker decides to activate the backdoor hidden within the network. In the last few years, backdoor attacks have been the subject of an intense research activity focusing on both the development of new classes of attacks, and the proposal of possible countermeasures. The goal of this overview paper is to review the works published until now, classifying the different types of attacks and defences proposed so far. The classification guiding the analysis is based on the amount of control that the attacker has on the training process, and the capability of the defender to verify the integrity of the data used for training, and to monitor the operations of the DNN at training and test time. As such, the proposed analysis is particularly suited to highlight the strengths and weaknesses of both attacks and defences with reference to the application scenarios they are operating in.
翻译:深神经网络(DNN)的AI技术,加上影响我们社会各个方面的令人印象深刻的进步,使我国社会各个方面都感到令人印象深刻的进步,基于深神经网络(DNN)的AI技术正在带来越来越多的安全关切。虽然在试验时间进行的攻击使研究人员最初的注意力垄断,但幕后攻击,利用干扰培训过程来腐蚀DN模型的可能性,进一步严重威胁AI技术的可靠性。在幕后攻击中,攻击者腐蚀了培训数据,从而导致测试时出现错误行为。试验时间错误,但只有在出现与适当设计的投入样本相对应的触发事件时,才会引发更多的安全关切。这样,腐败网络继续按预期定期投入运作,而恶意行为只有在攻击者决定激活网络内隐藏的幕后门模型时才发生。在过去几年里,幕后攻击成为一项集中研究活动的主题,重点是开发新的攻击类别和提出可能的反措施。本概览文件的目的是审查迄今出版的作品,对各种攻击和迄今提出的防御建议进行分类。在进行这种攻击的精确性分析时,对攻击的精确性分析依据时间的分类,是用于对攻击过程和试验过程的精确性的分析。