Many machine translation models are trained on bilingual corpus, which consist of aligned sentence pairs from two different languages with same semantic. However, there is a qualitative discrepancy between train and test set in bilingual corpus. While the most train sentences are created via automatic techniques such as crawling and sentence-alignment methods, the test sentences are annotated with the consideration of fluency by human. We suppose this discrepancy in training corpus will yield performance drop of translation model. In this work, we define \textit{fluency noise} to determine which parts of train sentences cause them to seem unnatural. We show that \textit{fluency noise} can be detected by simple gradient-based method with pre-trained classifier. By removing \textit{fluency noise} in train sentences, our final model outperforms the baseline on WMT-14 DE$\rightarrow$EN and RU$\rightarrow$EN. We also show the compatibility with back-translation augmentation, which has been commonly used to improve the fluency of the translation model. At last, the qualitative analysis of \textit{fluency noise} provides the insight of what points we should focus on.
翻译:许多机器翻译模型都是以双语形式培训的,包括来自两种不同语言、具有同一语义的对等配对的句子。然而,在双语文本中,火车和测试之间在质量上存在差异。虽然大多数火车的句子都是通过自动技术,如爬行和判决对称法,但测试句子是附加说明的,考虑到人的流利程度。我们认为,在培训中这种差异将产生翻译模式的性能下降。在这项工作中,我们定义了\ textit{fluency 噪声},以确定火车句子中哪些部分使这些部分显得不自然。我们用简单的梯度法与预先训练的分类器检测到\ textit{fluency 。在火车语句中,我们的最后模型比WMT-14 DE$\rightrow$EN 和 RU$\rightrow$EN 的基线要强。我们还展示了与背译增强的兼容性,后者通常用来改进翻译模式的流利度。我们最后,对文本{fluency 噪的定性分析提供了我们重点的洞察力。