High-performance anti-spoofing models for automatic speaker verification (ASV), have been widely used to protect ASV by identifying and filtering spoofing audio that is deliberately generated by text-to-speech, voice conversion, audio replay, etc. However, it has been shown that high-performance anti-spoofing models are vulnerable to adversarial attacks. Adversarial attacks, that are indistinguishable from original data but result in the incorrect predictions, are dangerous for anti-spoofing models and not in dispute we should detect them at any cost. To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario. Self-supervised learning models are effective in improving downstream task performance like phone classification or ASR. However, their effect in defense for adversarial attacks has not been explored yet. In this work, we explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks. A layerwise noise to signal ratio (LNSR) is proposed to quantize and measure the effectiveness of deep models in countering adversarial noise. Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples, and successfully counter black-box attacks.
翻译:自动语音核实(ASV)的高性能反排演模式被广泛用于保护ASV, 其方法是通过识别和过滤由文字对语音、语音转换、音频重放等故意生成的假音来识别和过滤假音。然而,事实证明,高性能反排演模式很容易受到对抗性攻击的伤害。与原始数据无法区分但导致错误预测的反反弹袭击,对于反弹模式是危险的,我们不应有任何争议。为了探讨这一问题,我们提议采用自制监督学习模式,即基于文字对语音、语音转换、音频重播等故意生成的假音。然而,事实证明,高性能反弹式反弹式反弹式的反弹性能模式,对于改进下游任务性能(如电话分类或ASR等)是有效的。然而,它们对于对抗性攻击的防御效果还没有被探索。在对抗性对抗性攻击的激烈性攻击的防御中,我们探索了自我监督的高层次表现的可靠性,在对抗性对抗性攻击性攻击的防御性攻击中使用了它们。在高水平上自我监督的学习的反向性反向性攻击的学习模型。Avoiral-RMMRM的衡量压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压压比。