Neural Machine Translation (NMT) systems are used in various applications. However, it has been shown that they are vulnerable to very small perturbations of their inputs, known as adversarial attacks. In this paper, we propose a new targeted adversarial attack against NMT models. In particular, our goal is to insert a predefined target keyword into the translation of the adversarial sentence while maintaining similarity between the original sentence and the perturbed one in the source domain. To this aim, we propose an optimization problem, including an adversarial loss term and a similarity term. We use gradient projection in the embedding space to craft an adversarial sentence. Experimental results show that our attack outperforms Seq2Sick, the other targeted adversarial attack against NMT models, in terms of success rate and decrease in translation quality. Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences while similarity with the original sentence stays preserved.
翻译:在各种应用中使用神经机器翻译系统(NMT) 。 但是,已经显示,它们很容易受到其投入的极小扰动,称为对抗性攻击。 在本文中,我们提议对NMT模型进行新的有针对性的对抗性攻击。 特别是,我们的目标是在对立句翻译中插入一个预先定义的目标关键字,同时保持原句与源域中被扰动的关键字的相似性。 为此,我们提出一个优化问题,包括一个对抗性损失术语和一个相似的术语。 我们在嵌入空间中使用梯度投影来编造一个对抗性判决。 实验结果显示,从成功率和翻译质量下降的角度来看,我们的攻击比其他针对NMT模型的对抗性攻击更符合Seq2Sick。 我们的攻击成功地在翻译中插入了超过75%的句子,同时保留与原句的相似性。</s>