Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performed DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner. However, recent studies revealed the threats of model stealing, where the adversaries can obtain a function-similar copy of the victim model, even when they can only query the model. In this paper, we propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously, without introducing new security risks. In general, we conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection. Extensive experiments on benchmark datasets verify the effectiveness of our method and its resistance to potential adaptive attacks. The codes for reproducing the main experiments of our method are available at \url{https://github.com/THUYimingLi/MOVE}.
翻译:目前,深层神经网络(DNNs)在不同应用中被广泛采用。尽管其商业价值很高,但培训完善的DNN是需要大量资源的。因此,经过良好训练的模型是拥有者的宝贵知识产权。然而,最近的研究揭示了模式盗窃的威胁,对手可以获取一个功能相似的受害者模型复制件,即使他们只能查询模型。在本文件中,我们提议一个有效和无害的模型所有权核查(MOVE),以同时防范不同类型的模式盗窃,而不必引入新的安全风险。一般来说,我们通过核查可疑模型是否包含维护者指定的外部特征的知识来进行所有权核查。具体地说,我们通过将一些培训样本与风格传输相匹配来嵌入外部特征。然后,我们训练了一个元化分类,以确定模型是否从受害者那里盗取了功能相似的模型。这一方法的灵感来自这样一种认识,即被盗模型应当包含对受害者模型所学特征的了解。特别是,我们在白箱和黑箱环境中开发了我们的MOVE方法,以提供全面的模型保护。关于基准数据集的广泛实验将验证我们的方法的有效性及其潜在的适应性攻击方法。