Speech is easily leaked imperceptibly, such as being recorded by mobile phones in different situations. Private content in speech may be maliciously extracted through speech enhancement technology. Speech enhancement technology has developed rapidly along with deep neural networks (DNNs), but adversarial examples can cause DNNs to fail. In this work, we propose an adversarial method to degrade speech enhancement systems. Experimental results show that generated adversarial examples can erase most content information in original examples or replace it with target speech content through speech enhancement. The word error rate (WER) between an enhanced original example and enhanced adversarial example recognition result can reach 89.0%. WER of target attack between enhanced adversarial example and target example is low to 33.75% . Adversarial perturbation can bring the rate of change to the original example to more than 1.4430. This work can prevent the malicious extraction of speech.
翻译:语音增强技术与深层神经网络(DNNs)一起迅速发展,但对抗性实例可能导致DNNs失败。在这项工作中,我们建议采用对抗性方法来降低语音增强系统。实验结果显示,生成的对抗性实例可以删除原始示例中的大多数内容信息,或通过语音增强来将其替换为目标语音内容。强化的原始示例与强化的对抗性示例识别结果之间的字词误差率(WER)可以达到89.0%。强化对抗性示例和目标示例之间的目标攻击率低至33.75%。反向干扰可以使原示例的变速率超过1.44330。这项工作可以防止恶意提取语音。