As an effective method for intellectual property (IP) protection, model watermarking technology has been applied on a wide variety of deep neural networks (DNN), including speech classification models. However, how to design a black-box watermarking scheme for automatic speech recognition (ASR) models is still an unsolved problem, which is a significant demand for protecting remote ASR Application Programming Interface (API) deployed in cloud servers. Due to conditional independence assumption and label-detection-based evasion attack risk of ASR models, the black-box model watermarking scheme for speech classification models cannot apply to ASR models. In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models. Specifically, we synthesize trigger audios by spreading the speech clips of model owners over the entire input audios and labeling the trigger audios with the stego texts, which hides the authorship information with linguistic steganography. Experiments on the state-of-the-art open-source ASR system DeepSpeech demonstrate the feasibility of the proposed watermarking scheme, which is robust against five kinds of attacks and has little impact on accuracy.
翻译:作为一种有效的知识产权保护方法,示范水标记技术已应用于多种深神经网络,包括语音分类模型;然而,如何设计黑箱水标记自动语音识别模型(ASR)模型的黑箱水标记模型仍然是一个尚未解决的问题,这是保护在云服务器上部署的远程ASR应用程序接口(API)的巨大需求。由于有条件的独立假设和基于标签的ASR(ASR)模型的规避攻击风险,对语音分类模型的黑箱水标记模型模型无法适用于ASR模型。在本文件中,我们提出了第一个保护ASR模型模型的黑箱水标记模型框架。具体地说,我们通过将模型所有者的语音剪贴在全部输入音频上,并将触发音频与Stego文本贴上标签,该文本将作者的信息隐藏在语言扫描中。关于最先进的开放源ASR(DreepSpeech)系统实验显示拟议的水标记模型的可行性,该模型对五种攻击具有强的准确性,对准度很小的影响。