In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.
翻译:在这项研究中,我们建议对低资源通用指令识别(SCR)采用新的对抗性重新编程(AR)方法,并建立一个AR-SCR系统。AR程序旨在修改音频信号(从目标域)以重新使用预先训练的SCR模型(从源域)。为了解决源域和目标域之间的标签不匹配,并进一步提高AR的稳定性,我们建议采用基于相似性的新颖标签绘图技术来对等等级。此外,转移学习(TL)技术与最初的AR进程相结合,以提高模型适应能力。我们评估了拟议的AR-SCR系统关于三个低资源SCR数据集,包括阿拉伯文、立陶宛文和德萨尔·曼达林语的数据集。实验结果表明,如果经过关于大规模英文数据集的预先训练,拟议的AR-SCR系统将超越目前阿拉伯语和立陶宛语音指令数据集的最新结果,只有有限的培训数据。