Speaker verification systems have been widely used in smart phones and Internet of things devices to identify a legitimate user. In recent work, it has been shown that adversarial attacks, such as FAKEBOB, can work effectively against speaker verification systems. The goal of this paper is to design a detector that can distinguish an original audio from an audio contaminated by adversarial attacks. Specifically, our designed detector, called MEH-FEST, calculates the minimum energy in high frequencies from the short-time Fourier transform of an audio and uses it as a detection metric. Through both analysis and experiments, we show that our proposed detector is easy to implement, fast to process an input audio, and effective in determining whether an audio is corrupted by FAKEBOB attacks. The experimental results indicate that the detector is extremely effective: with near zero false positive and false negative rates for detecting FAKEBOB attacks in Gaussian mixture model (GMM) and i-vector speaker verification systems. Moreover, adaptive adversarial attacks against our proposed detector and their countermeasures are discussed and studied, showing the game between attackers and defenders.
翻译:发言人核查系统被广泛用于智能手机和互联网设备中,以识别合法用户;最近的工作显示,对抗性攻击,如FakeBOB,能够有效地对付发言者核查系统;本文件的目的是设计一个探测器,能够将原始音频与受对抗性攻击污染的音频区分开来;具体地说,我们设计的探测器,称为MEH-FEST,从音频的短时间四轮变换中计算出高频最低能量,并将其作为探测度量;通过分析和实验,我们表明,我们提议的探测器易于执行,能够快速处理输入音频,能够有效地确定一个音频是否被FakeBOB攻击腐蚀;实验结果表明,探测器非常有效:几乎没有假正反率和假负速,用以探测高斯州混合模型(GMM)和i-Voctor喇叭检查系统中的FakeBOB攻击;此外,讨论和研究针对我们提议的探测器及其反措施的适应性对抗性攻击,显示攻击者与捍卫者之间的游戏。