Although end-to-end automatic speech recognition (e2e ASR) models are widely deployed in many applications, there have been very few studies to understand models' robustness against adversarial perturbations. In this paper, we explore whether a targeted universal perturbation vector exists for e2e ASR models. Our goal is to find perturbations that can mislead the models to predict the given targeted transcript such as "thank you" or empty string on any input utterance. We study two different attacks, namely additive and prepending perturbations, and their performances on the state-of-the-art LAS, CTC and RNN-T models. We find that LAS is the most vulnerable to perturbations among the three models. RNN-T is more robust against additive perturbations, especially on long utterances. And CTC is robust against both additive and prepending perturbations. To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.


翻译:虽然端到端自动语音识别(e2e ASR)模型在许多应用中广泛应用,但很少有研究来了解模型对对抗性扰动的稳健性。 在本文中,我们探讨e2e ASR模型是否存在有针对性的普遍扰动矢量。 我们的目标是找到能够误导模型的扰动,以预测特定目标记录,如“谢谢你”或任何输入内容的空字符串。我们研究了两种不同的攻击,即添加和预断扰动,以及它们对于最先进的LAS、CTC和RNN-T模型的性能。我们发现LAS是三种模型中最容易受扰动的。 RNN-T对添加性扰动性扰动性更强。而CT既能防止添加性,又防止预设扰动性干扰。为了攻击RNNN-T,我们发现预断性扰动比添加性扰动更有效,并且能够误导模型预测任意长度的短目标。

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/
VIP会员
Top
微信扫码咨询专知VIP会员