Keyword spotting (KWS) has been widely used in various speech control scenarios. The training of KWS is usually based on deep neural networks and requires a large amount of data. Manufacturers often use third-party data to train KWS. However, deep neural networks are not sufficiently interpretable to manufacturers, and attackers can manipulate third-party training data to plant backdoors during the model training. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Pitch Boosting and Sound Masking for KWS, called PBSM. Experimental results demonstrated that PBSM is feasible to achieve an average attack success rate close to 90% in three victim models when poisoning less than 1% of the training data.
翻译:关键字定位( KWS) 在各种语言控制情景中被广泛使用。 KWS 的培训通常基于深神经网络, 需要大量的数据。 制造商通常使用第三方数据来培训 KWS 。 但是, 深神经网络对于制造商来说并不完全可以解释, 攻击者可以在模型培训期间操纵第三方培训数据来安装后门 。 有效的后门攻击可以迫使模型在某些条件下( 即触发器)做出特定判断 。 在本文中, 我们设计了一个以Pitch Bushsting 和 Sound Makeing for KWS( 称为 PBSM ) 为基础的后门袭击计划。 实验结果显示, 当污染不到培训数据的1%时, PBSM 可以达到三种受害者模式中接近90%的平均袭击成功率。