With the growing burden of training deep learning models with large data sets, transfer-learning has been widely adopted in many emerging deep learning algorithms. Transformer models such as BERT are the main player in natural language processing and use transfer-learning as a de facto standard training method. A few big data companies release pre-trained models that are trained with a few popular datasets with which end users and researchers fine-tune the model with their own datasets. Transfer-learning significantly reduces the time and effort of training models. However, it comes at the cost of security concerns. In this paper, we show a new observation that pre-trained models and fine-tuned models have significantly high similarities in weight values. Also, we demonstrate that there exist vendor-specific computing patterns even for the same models. With these new findings, we propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model with vendor-specific computing patterns and then estimates the entire model weights based on the weight value similarities between the fine-tuned model and pre-trained model. We also show that the weight similarity can be leveraged for increasing the model extraction feasibility through a novel weight extraction pruning.
翻译:随着以大型数据集培训深层学习模式的负担日益加重,许多新兴深层学习算法都广泛采用转移学习。BERT等变异模型是自然语言处理中的主要角色,并将转移学习作为一种事实上的标准培训方法。一些大数据公司发布预先培训的模型,这些模型经过一些广受欢迎的数据集的培训,最终用户和研究人员用自己的数据集微调模型;转移学习大大减少了培训模型的时间和努力。然而,转移学习却以安全考虑的代价为代价。在本文件中,我们发现新的观察显示,预先培训的模型和微调模型在重量值方面有很大的相似性。此外,我们还显示,即使对相同的模型,也存在特定供应商的计算模式。有了这些新发现,我们提议一种新的模型提取攻击,展示模型结构以及黑箱受害者模型用供应商特有的计算模式使用的预先培训模型,然后根据微调模型和预培训模型之间的重量相似性来估计整个模型重量。我们还表明,在通过新的重力提取模型提高模型的模型可行性方面,也可以利用这种重量相似性。