In this paper, we propose dictionary attacks against speaker verification - a novel attack vector that aims to match a large fraction of speaker population by chance. We introduce a generic formulation of the attack that can be used with various speech representations and threat models. The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population. The resulting master voice successfully matches a non-trivial fraction of people in an unknown population. Adversarial waveforms obtained with our approach can match on average 69% of females and 38% of males enrolled in the target system at a strict decision threshold calibrated to yield false alarm rate of 1%. By using the attack with a black-box voice cloning system, we obtain master voices that are effective in the most challenging conditions and transferable between speaker encoders. We also show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
翻译:在本文中,我们建议用字典攻击校正校验声员,这是一种新颖的攻击矢量,旨在偶然地与大量发言者进行匹配。我们引入了攻击的通用配方,可用于各种演讲和威胁模型。攻击者使用对抗性优化,最大限度地扩大种子语音样本和代理人群之间语言嵌入的原始相似性。由此产生的主声音成功地匹配了在未知人群中非三分部分的人。通过我们的方法获得的反向波形平均可以匹配69%的女性和38%的男性在目标系统中注册,其严格的决定阈值可以调整为1%的错误警报率。通过使用黑盒语音克隆系统,我们获得了在最困难的条件下有效的主声音,而且可以在发言者编码器之间转移。我们还表明,这次攻击加上多次尝试,使得这些系统的安全问题变得更加严重。