Speaker verification has been widely used in many authentication scenarios. However, training models for speaker verification requires large amounts of data and computing power, so users often use untrustworthy third-party data or deploy third-party models directly, which may create security risks. In this paper, we propose a backdoor attack for the above scenario. Specifically, for the Siamese network in the speaker verification system, we try to implant a universal identity in the model that can simulate any enrolled speaker and pass the verification. So the attacker does not need to know the victim, which makes the attack more flexible and stealthy. In addition, we design and compare three ways of selecting attacker utterances and two ways of poisoned training for the GE2E loss function in different scenarios. The results on the TIMIT and Voxceleb1 datasets show that our approach can achieve a high attack success rate while guaranteeing the normal verification accuracy. Our work reveals the vulnerability of the speaker verification system and provides a new perspective to further improve the robustness of the system.
翻译:说话人验证在许多身份验证场景中得到广泛应用。但是,训练说话人验证模型需要大量数据和计算能力,因此用户通常使用不可信的第三方数据或直接部署第三方模型,这可能会创建安全风险。在本文中,我们提出了一个针对上述情况的后门攻击。具体而言,对于说话人验证系统中的Siamese网络,我们尝试植入一种通用身份,该身份可以模拟任何已注册的说话人并通过验证。因此,攻击者不需要了解受害者,这使得攻击更加灵活和隐蔽。此外,我们设计并比较了三种选择攻击者话语的方法和两种不同情况下GE2E损失函数的毒化训练方式。 TIMIT和Voxceleb1数据集上的结果表明,我们的方法可以在保证正常验证准确率的同时实现高攻击成功率。我们的工作揭示了说话人验证系统的漏洞,并提供了进一步提高系统鲁棒性的新视角。