While there are more than 7000 languages in the world, most translation research efforts have targeted a few high-resource languages. Commercial translation systems support only one hundred languages or fewer, and do not make these models available for transfer to low resource languages. In this work, we present useful tools for machine translation research: MTData, NLCodec, and RTG. We demonstrate their usefulness by creating a multilingual neural machine translation model capable of translating from 500 source languages to English. We make this multilingual model readily downloadable and usable as a service, or as a parent model for transfer-learning to even lower-resource languages.
翻译:虽然世界上有7 000多种语言,但大多数翻译研究努力都针对少数高资源语言。商业翻译系统只支持100种或更少的语言,不能将这些模型转移到低资源语言。在这项工作中,我们为机器翻译研究提供了有用的工具:MTData、NLOCodec和RTG。我们通过创建一个多语言神经机翻译模型,将500种原始语言翻译成英语,来证明这些模型的实用性。我们使这一多语言模型易于下载,并作为一种服务使用,或作为向更低资源语言转移学习的母模型。