Bit-flip attacks (BFAs) have attracted substantial attention recently, in which an adversary could tamper with a small number of model parameter bits to break the integrity of DNNs. To mitigate such threats, a batch of defense methods are proposed, focusing on the untargeted scenarios. Unfortunately, they either require extra trustworthy applications or make models more vulnerable to targeted BFAs. Countermeasures against targeted BFAs, stealthier and more purposeful by nature, are far from well established. In this work, we propose Aegis, a novel defense method to mitigate targeted BFAs. The core observation is that existing targeted attacks focus on flipping critical bits in certain important layers. Thus, we design a dynamic-exit mechanism to attach extra internal classifiers (ICs) to hidden layers. This mechanism enables input samples to early-exit from different layers, which effectively upsets the adversary's attack plans. Moreover, the dynamic-exit mechanism randomly selects ICs for predictions during each inference to significantly increase the attack cost for the adaptive attacks where all defense mechanisms are transparent to the adversary. We further propose a robustness training strategy to adapt ICs to the attack scenarios by simulating BFAs during the IC training phase, to increase model robustness. Extensive evaluations over four well-known datasets and two popular DNN structures reveal that Aegis could effectively mitigate different state-of-the-art targeted attacks, reducing attack success rate by 5-10$\times$, significantly outperforming existing defense methods.
翻译:最近,Bitflip攻击(BFAs)引起了大量关注,其中对手可以篡改少量模型参数比特数,以打破DNNs的完整性。为了减轻这种威胁,我们提出了一批防御方法,以非目标情景为重点。不幸的是,它们需要额外的可信赖应用程序,或者使模型更容易受到目标BFA的攻击。针对目标BFA、偷盗和目的性更明确的BFA攻击的对策远远没有建立起来。在这个工作中,我们提议Aegis,这是减少目标BFAs的新防御方法。核心观察是,现有的定点攻击侧重于在某些重要层面翻转关键比特。因此,我们设计了一套动态退出机制,将额外的内部分类(ICs)附在隐藏的层次上。这个机制使输入样本能够从不同层次早期流出,这实际上扰乱了对手的攻击计划。此外,动态退出机制随机选择ICs,在每次民众猜测期间大幅增加攻击成本,在所有防御机制都对敌人透明的情况下,使调整关键部分的关键部分。我们进一步提议一个动态退出机制,将现有精确度培训战略,使IC级攻击的准确度提升B-C阶段评估。</s>