Multilingual NMT has become an attractive solution for MT deployment in production. But to match bilingual quality, it comes at the cost of larger and slower models. In this work, we consider several ways to make multilingual NMT faster at inference without degrading its quality. We experiment with several "light decoder" architectures in two 20-language multi-parallel settings: small-scale on TED Talks and large-scale on ParaCrawl. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality. We validate our findings with BLEU and chrF (on 380 language pairs), robustness evaluation and human evaluation.
翻译:多语言NMT已成为生产中MT部署的有吸引力的解决方案。 但是,为了与双语质量相匹配,它的成本是较大和较慢的模型。 在这项工作中,我们考虑了若干方法,使多语言NMT更快地推论,而不会降低其质量。我们在两个20种多语言的多平行环境中试验了几个“灯解码器”结构:小型的TED会谈和大型的ParaCrawl。我们的实验表明,将浅线解码器与词汇过滤相结合,可以导致更快的推理速度超过两倍,在翻译质量方面没有损失。我们用BLEU和chrF(对380种语言)、稳健评价和人文评估来验证我们的结论。