断线神经机器翻译 (Synthetic and Natural Noise Both Break Neural Machine Translation)

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

翻译：基于字符的神经机器翻译模型(NMT)缓解了校外问题,学习了形态学,并使我们更接近完全端对端翻译系统。不幸的是,它们也非常粗糙,在提供吵闹的数据时很容易动摇。在本文中,我们用合成和自然的噪音源来面对NMT模型。我们发现,最先进的模型甚至不能翻译人类无法理解的中度吵闹的文本。我们探索了两种方法来增强模型的坚固性:结构变化式的文字表达和对吵闹文本的有力培训。我们发现,基于性能共振神经网络的模型能够同时学习对多种噪音的强大表现。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日