While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable. Generating adversarial examples as the augmented data has been proved to be useful to alleviate this problem. Existing methods for adversarial example generation (AEG) are word-level or character-level, which ignore the ubiquitous phrase structure. In this paper, we propose a Phrase-level Adversarial Example Generation (PAEG) framework to enhance the robustness of the translation model. Our method further improves the gradient-based word-level AEG method by adopting a phrase-level substitution strategy. We verify our method on three benchmarks, including LDC Chinese-English, IWSLT14 German-English, and WMT14 English-German tasks. Experimental results demonstrate that our approach significantly improves translation performance and robustness to noise compared to previous strong baselines.
翻译:虽然端到端神经机翻译(NMT)取得了令人印象深刻的进展,但噪音输入通常导致模型变得脆弱和不稳定。生成对抗性实例,因为数据扩充已证明有助于缓解这一问题。现有的对抗性实例生成方法为字级或字符级,忽视了无处不在的词组结构。在本文件中,我们提议了一个声阶水平的反向实例生成框架,以加强翻译模型的稳健性。我们的方法通过采用语句级替代战略,进一步改进了基于梯度的AEG级词级方法。我们核实了我们的三个基准方法,包括最不发达国家中文-英语、IWSLT14 德语-英语和WMT14英语-德语-德语任务。实验结果表明,我们的方法大大改进了翻译性能和声音的稳健性,而与以往的强势基线相比,我们的方法也大大改进了。