Adversarial extraction attacks constitute an insidious threat against Deep Learning (DL) models in-which an adversary aims to steal the architecture, parameters, and hyper-parameters of a targeted DL model. Existing extraction attack literature have observed varying levels of attack success for different DL models and datasets, yet the underlying cause(s) behind their susceptibility often remain unclear, and would help facilitate creating secure DL systems. In this paper we present PINCH: an efficient and automated extraction attack framework capable of designing, deploying, and analyzing extraction attack scenarios across heterogeneous hardware platforms. Using PINCH, we perform extensive experimental evaluation of extraction attacks against 21 model architectures to explore new extraction attack scenarios and further attack staging. Our findings show (1) key extraction characteristics whereby particular model configurations exhibit strong resilience against specific attacks, (2) even partial extraction success enables further staging for other adversarial attacks, and (3) equivalent stolen models uncover differences in expressive power, yet exhibit similar captured knowledge.
翻译:反向采掘攻击是对深层学习(DL)模型的阴险威胁,敌对方的目的是窃取定向DL模型的结构、参数和超参数;现有提取攻击文献观察到不同DL模型和数据集的不同攻击成功程度,但其易感性背后的根本原因往往仍然不清楚,有助于创建安全的DL系统。本文介绍PINCH:一个高效和自动的提取攻击框架,能够设计、部署和分析不同硬件平台的提取攻击情景。我们利用PINCH对21个模型的提取攻击进行广泛的实验性评估,以探索新的提取攻击情景和进一步袭击的集结。我们的调查结果显示:(1) 某些模型具有关键的提取特征,在其中,特定袭击中表现出很强的抗力,(2) 甚至部分提取成功也能够进一步引发其他对抗性攻击,(3) 等同的被盗模型揭示了表达力的差异,但展示了类似的知识。