We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.
翻译:我们展示了微小的翻译系统的潜力,这些系统在高、低资源语言配对方面都受过未受重视的语言数据培训。我们展示了高、低资源语言配对方面的潜在潜力。我们展示了只有五个高品质翻译数据实例,在推论中显示的只有五个高品质翻译数据实例,一个仅经过自我监督学习培训的变压器解码器单一模型,能够匹配专门监管的先进模型以及更一般的商业翻译系统。我们尤其表现得优于WMT'21年英文-中文新闻翻译工作的最佳运作系统,只使用了五个中英平行数据实例。此外,我们建设这些模型的方法并不要求联合进行多语种培训或回译,而是在概念上简单明了,并显示了推广到多语种环境的潜力。此外,所产生的模型规模比最新语言模式规模小两级,比一般商业翻译系统的效果要小,我们然后分析了影响少数翻译系统工作的因素,并着重指出,少发演示的质量在很大程度上决定了我们模型产生的翻译的质量。最后,我们展示了少发模式的范例也只能提供一种途径,我们只能用来控制某些机器翻译的典型。