With the rapid development of natural language processing (NLP) technology, NLP models have shown great economic value in business. However, the owner's models are vulnerable to the threat of pirated redistribution, which breaks the symmetry relationship between model owners and consumers. Therefore, a model protection mechanism is needed to keep the symmetry from being broken. Currently, language model protection schemes based on black-box verification perform poorly in terms of invisibility of trigger samples, which are easily detected by humans or anomaly detectors and thus prevent verification. To solve this problem, this paper proposes a trigger sample of the triggerless mode for ownership verification. In addition, a thief may replace the classification module for a watermarked model to satisfy its specific classification task and remove the watermark present in the model. Therefore, this paper further proposes a new threat of replacing the model classification module and performing global fine-tuning of the model, and successfully verifies the model ownership through a white-box approach. Meanwhile, we use the properties of blockchain such as tamper-proof and traceability to prevent the ownership statement of thieves. Experiments show that the proposed scheme successfully verifies ownership with 100% watermark verification accuracy without affecting the original performance of the model, and has strong robustness and low False trigger rate.
翻译:随着自然语言处理技术(NLP)的迅速发展,NLP模型在商业中显示出巨大的经济价值,然而,业主模型很容易受到盗版再分配的威胁,从而打破了模型所有者和消费者之间的对称关系。因此,需要有一个模型保护机制来保持对称性不破裂。目前,基于黑盒核查的语文模型保护计划在触发样品的隐形性方面表现不佳,而触发样品很容易被人类或异常探测器发现,从而阻止核查。为解决这一问题,本文件提议对无触发模式进行触发性抽样,以便核实所有权。此外,小偷可以取代水标记模型的分类模块,以满足其具体的分类任务,并删除模型中存在的水标记。因此,本文件进一步提出一个新的威胁,即替换模型分类模块,进行全球微调,并通过白盒方法成功地验证模型所有权。同时,我们使用诸如防篡改和可追踪性能等阻隔断链的特性来防止窃贼的所有权声明。实验表明,拟议的计划成功地用100 %的初始性测试率和10 %的触发性精确度来验证模型。