Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses.
翻译:尽管最近出现的对抗性攻击严重威胁了自动扬声器核查(ASV)的安全,但还是有一些缓解威胁的对策,然而,许多防御方法不仅需要攻击者事先了解,而且容易解释。为了解决这个问题,我们在本文件中建议采用攻击者独立和可解释的方法,即称为可学习的面具探测器(LMD),将对抗性例子与真实的例子区分开来。它使用评分差异作为辨别对抗性例子的指标,其中得分差异是原声频记录ASV分数与从隐藏的复杂光谱图合成的音频之间的绝对差异。得分变异探测器的核心组成部分是用神经网络生成蒙面光谱光谱光谱光谱光谱光谱光谱光谱谱。神经网络只需要真正的培训例子,因此它是一种攻击者独立的方法。它的可解释性是,神经网络受过培训,以尽量减少目标的ASVV的得分差,并最大限度地增加真实培训示例。它的基础是基于观测结果,用大量的数据,掩盖了大规模的实验性基准值测量结果,并且用最起码的实验性标准显示我们最起码的测试方法的分数。