Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
翻译:编码器- 解码器架构被广泛用于序列到序列的建模任务。 对于机器翻译来说,尽管从长期的短期记忆网络演变到变异器网络,再加上引入和开发关注机制,但编码器- 解码器仍然是最先进的模型事实上的神经网络架构。虽然从某些隐蔽空间解码信息的动机是直截了当的,但严格将编码和解码步骤分离为编码器和模型架构的解码器并不一定是必须的。与目标语言自动递增语言建模的任务相比,机器翻译只是增加了一个源句子作为背景。鉴于现在的神经语言模型已经能够处理目标语言中相当长的背景,因此自然地会问,仅仅将源和目标句解码解码并培训一个语言模型进行翻译是否有效。在这项工作中,我们调查上述机器翻译的概念。具体地说,我们试验双语翻译,使用额外的目标单语数据翻译,以及多语翻译。在所有情况中,这种替代的翻译方法可能与该机器的变形结构进行升级。