A backdoor attack places triggers in victims' deep learning models to enable a targeted misclassification at testing time. In general, triggers are fixed artifacts attached to samples, making backdoor attacks easy to spot. Only recently, a new trigger generation harder to detect has been proposed: the stylistic triggers that apply stylistic transformations to the input samples (e.g., a specific writing style). Currently, stylistic backdoor literature lacks a proper formalization of the attack, which is established in this paper. Moreover, most studies of stylistic triggers focus on text and images, while there is no understanding of whether they can work in sound. This work fills this gap. We propose JingleBack, the first stylistic backdoor attack based on audio transformations such as chorus and gain. Using 444 models in a speech classification task, we confirm the feasibility of stylistic triggers in audio, achieving 96% attack success.
翻译:后门攻击触发了受害者深层次学习模式中的触发点, 从而在测试时能够有针对性地进行错误分类。 一般来说, 触发点是附在样本上的固定文物, 使得后门攻击容易发现。 直到最近才提出了更难探测的新一代触发点: 对输入样本应用文体变异的立体触发点( 例如特定的写作风格 ) 。 目前, 文体后门文学缺乏对攻击的适当正规化, 这一点在本文中已经确立 。 此外, 大多数关于文体触发点的研究都集中在文本和图像上, 但却无法理解它们是否能在声音上工作。 这项工作填补了这个空白。 我们建议了音体后门攻击, 以音频变异如合唱和收益为基础。 在语音分类任务中使用444个模型, 我们确认音频触发点的可行性, 达到96%的攻击成功率 。