Early backdoor attacks against machine learning set off an arms race in attack and defence development. Defences have since appeared demonstrating some ability to detect backdoors in models or even remove them. These defences work by inspecting the training data, the model, or the integrity of the training procedure. In this work, we show that backdoors can be added during compilation, circumventing any safeguards in the data preparation and model training stages. As an illustration, the attacker can insert weight-based backdoors during the hardware compilation step that will not be detected by any training or data-preparation process. Next, we demonstrate that some backdoors, such as ImpNet, can only be reliably detected at the stage where they are inserted and removing them anywhere else presents a significant challenge. We conclude that machine-learning model security requires assurance of provenance along the entire technical pipeline, including the data, model architecture, compiler, and hardware specification.
翻译:在攻击和国防发展中,针对机器早期后门攻击引发了军备竞赛; 国防自此以来似乎表现出某种能力,能够以模型或甚至去除模型探测后门; 这些防御工作通过检查培训数据、模型或培训程序的完整性来进行; 在这项工作中,我们表明,在编译过程中可以增加后门,绕过数据编制和示范培训阶段的任何保障措施; 举例来说,攻击者可以在硬件汇编步骤中插入重心后门,而任何培训或数据准备过程都无法检测到这些后门; 其次,我们证明,一些后门,例如ImpNet,只能在其插入的阶段可靠地探测出来,在其它任何地点清除后门都构成重大挑战。 我们的结论是,机器学习模式的安全需要整个技术管道的出处保证,包括数据、模型结构、编译器和硬件规格。