Pretrained large character-level language models have been recently revitalized and shown to be competitive with subword models across a range of NLP tasks. However, there has not been any research showing their effectiveness in neural machine translation (NMT). This work performs an extensive comparison across multiple languages and experimental conditions of state-of-the-art character- and subword-level pre-trained models (ByT5 and mT5, respectively) on NMT, and shows that the former not only are effective in translation, but frequently outperform subword models, particularly in cases where training data is limited. The only drawback of character models appears to be their inefficiency (at least 4 times slower to train and for inference). Further analysis indicates that character models are capable of implicitly translating on the word or subword level, thereby nullifying a major potential weakness of operating on the character level.
翻译:受过训练的大型品格级语言模式(分别为ByT5和mT5)最近得到振兴,并显示出在一系列NLP任务中与子字型模式相比具有竞争力,但没有任何研究显示其在神经机翻译(NMT)方面的效力,这项工作对多种语文进行了广泛比较,对NMT上最先进的品格和次字级先培训模式(分别为ByT5和mT5)进行了试验,并表明前者不仅在翻译方面有效,而且往往优于副词型,特别是在培训数据有限的情况下。字符模型的唯一缺点似乎是效率低下(至少比培训和推断慢4倍)。进一步的分析表明,字符模型能够隐含地翻译字词或次词级语言水平,从而消除了在字符级操作上可能存在的主要弱点。</s>