This paper asks the intriguing question: is it possible to exploit neural architecture search (NAS) as a new attack vector to launch previously improbable attacks? Specifically, we present EVAS, a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. Compared with existing attacks, EVAS demonstrates many interesting properties: (i) it does not require polluting training data or perturbing model parameters; (ii) it is agnostic to downstream fine-tuning or even re-training from scratch; (iii) it naturally evades defenses that rely on inspecting model parameters or training data. With extensive evaluation on benchmark datasets, we show that EVAS features high evasiveness, transferability, and robustness, thereby expanding the adversary's design spectrum. We further characterize the mechanisms underlying EVAS, which are possibly explainable by architecture-level ``shortcuts'' that recognize trigger patterns. This work raises concerns about the current practice of NAS and points to potential directions to develop effective countermeasures.
翻译:本文提出一个令人感兴趣的问题:能否利用神经结构搜索(NAS)作为新的攻击矢量来发射以前不可能发生的攻击?具体地说,我们介绍了EVAS,这是一次利用NAS找到内含后门的神经结构并利用输入觉悟触发器来利用这种脆弱性的新攻击。与现有的攻击相比,EVAS展示了许多有趣的特性:(一)它不需要污染的培训数据或扰动模型参数;(二)它对于下游微调甚至从零开始的再训练是不可知的;(三)它自然避开依靠检查模型参数或训练数据的防御。我们通过对基准数据集的广泛评价,表明EVAS具有高度的蒸发性、可转移性以及稳健性,从而扩大了敌人的设计频谱。我们进一步说明了EVAS背后的机制,这些机制可以通过建筑层次的“短期”来解释,从而识别触发模式。这项工作引起了人们对NAS目前的做法的关切,并指明了制定有效反措施的潜在方向。