The study of adversarial vulnerabilities of deep neural networks (DNNs) has progressed rapidly. Existing attacks require either internal access (to the architecture, parameters, or training set of the victim model) or external access (to query the model). However, both the access may be infeasible or expensive in many scenarios. We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. Instead, the attacker can only gather a small number of examples from the same problem domain as that of the victim model. Such a stronger threat model greatly expands the applicability of adversarial attacks. We propose three mechanisms for training with a very small dataset (on the order of tens of examples) and find that prototypical reconstruction is the most effective. Our experiments show that adversarial examples crafted on prototypical auto-encoding models transfer well to a variety of image classification and face verification models. On a commercial celebrity recognition system held by clarifai.com, our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
翻译:对深神经网络(DNNs)的对抗性脆弱性的研究进展迅速,现有的攻击要求内部进入(对受害者模型的结构、参数或培训集)或外部进入(对模型进行查询),但在许多情形中,访问可能不可行或费用昂贵。我们调查了没有框的对抗性例子,攻击者既无法获得模型信息或培训集,也无法查询模型。相反,攻击者只能从与受害者模型相同的问题领域收集少量例子。这种更强大的威胁模式大大扩大了对抗性攻击的可适用性。我们建议采用三个机制,用非常小的数据集(按数十个实例排列)进行培训,发现原型重建最为有效。我们的实验表明,在原型自动编码模型上制作的对抗性例子可以很好地转换到各种图像分类和表面验证模型。在Clifai.com所保持的商业名人识别系统上,我们的方法大大降低了系统的平均预测准确度,只有15.40%,这与从前方形模型转移敌对性例子的攻击一样。