Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns. In this work, we systematically investigate transformation and adversarial training based defenses for securing SRSs. According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition. With careful regard for best practices in defense evaluations, we analyze the strength of transformations to withstand adaptive attacks. We also evaluate and understand their effectiveness against adaptive attacks when combined with adversarial training. Our study provides lots of useful insights and findings, many of them are new or inconsistent with the conclusions in the image and speech recognition domains, e.g., variable and constant bit rate speech compressions have different performance, and some non-differentiable transformations remain effective against current promising evasion techniques which often work well in the image domain. We demonstrate that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarial training in a complete white-box setting, e.g., increasing the accuracy by 13.62% and attack cost by two orders of magnitude, while other transformations do not necessarily improve the overall defense capability. This work sheds further light on the research directions in this field. We also release our evaluation platform SPEAKERGUARD to foster further research.
翻译:发言人识别系统(SRS)最近被证明容易受到对抗性攻击,引起严重的安全关切。在这项工作中,我们系统地调查改造和以对抗性训练为基础的防御,以确保SRS的安全。根据SRS的特点,我们提出22种不同的改造,并用最近7次有希望的对抗性攻击(4个白箱和3个黑盒)来彻底评估这些改造。在仔细研究防御评估的最佳做法,我们分析为抵御适应性攻击而进行改造的力度。我们还评估和了解这些改造在与对抗性训练相结合时的效果。我们的研究提供了许多有用的洞察和发现,其中许多是新的或与图像和语音识别领域的结论不一致的。根据SRS的特点,我们提出了22种不同的改造,并用最近7次有希望的对抗性对抗性攻击(4个白箱和3个黑盒)来彻底评估它们。在仔细研究评价防御性攻击能力的同时,我们提出的新式的特征级变换,加上对抗性训练,与在完全的白箱设置中进行的唯一的对抗性训练相比是相当有效的。例如,提高13.62%的精确度,或与图像识别领域的结论与结论不一致。 不断的位缩缩缩的语音研究平台上,我们的研究方向必然会提高整个研究方向。