Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.
翻译:基于字符的神经机器翻译模型(NMT)缓解了校外问题,学习了形态学,并使我们更接近完全端对端翻译系统。 不幸的是,它们也非常粗糙,在提供吵闹的数据时很容易动摇。 在本文中,我们用合成和自然的噪音源来面对NMT模型。我们发现,最先进的模型甚至不能翻译人类无法理解的中度吵闹的文本。我们探索了两种方法来增强模型的坚固性:结构变化式的文字表达和对吵闹文本的有力培训。我们发现,基于性能共振神经网络的模型能够同时学习对多种噪音的强大表现。