In this work, we propose a multi-target backdoor attack against speaker identification using position-independent clicking sounds as triggers. Unlike previous single-target approaches, our method targets up to 50 speakers simultaneously, achieving success rates of up to 95.04%. To simulate more realistic attack conditions, we vary the signal-to-noise ratio between speech and trigger, demonstrating a trade-off between stealth and effectiveness. We further extend the attack to the speaker verification task by selecting the most similar training speaker - based on cosine similarity - as a proxy target. The attack is most effective when target and enrolled speaker pairs are highly similar, reaching success rates of up to 90% in such cases.
翻译:在本研究中,我们提出了一种针对说话人识别的多目标后门攻击方法,该方法使用位置无关的咔哒声作为触发器。与以往的单目标攻击方法不同,我们的方法可同时针对多达50个说话人,成功率最高可达95.04%。为模拟更真实的攻击条件,我们调整了语音与触发器之间的信噪比,证明了隐蔽性与有效性之间的权衡关系。我们进一步将该攻击扩展至说话人验证任务,通过选择余弦相似度最高的训练说话人作为代理目标。当目标说话人与注册说话人高度相似时,攻击效果最为显著,在此类情况下成功率最高可达90%。