Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We conduct an in-depth analysis of different multilingual prompting approaches, showing in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.
翻译:GPT-3等大规模基因化语言模型是具有竞争力的少数学习者。虽然这些模型已知能够共同代表多种不同语言,但其培训数据以英语为主,有可能限制其跨语言的概括性。在这项工作中,我们用一套包含多种语言的文体来培训多语种基因化语言模型,并在一系列广泛的任务中研究其少见和零见的学习能力。我们最大的模型有75亿个参数,用20多种有代表性的语言在几发学习中建立了新水平,在多语种常识推理(在0发环境中的绝对精确度提高+7.4%,在4发环境中的绝对精确度提高+9.4%)和自然语言推论(在0发和4发环境中的每种语文都提高5.4% )中,我们用这个最大的模型在182个方向上比GPT-3高出171个,同时在45个方向上超过官方监督的基线。我们深入分析了不同的多语种推理方法(在0发环境中的绝对精确度提高绝对精确度,在4发环境中的绝对精确度上提高+9.4%)和自然语言推论推论(在0.3级测试中,在最后通过甚高语言中,我们通过类似的示范中都能够通过不同的语言获得强烈的成绩,通过类似的示范,通过对比性示范,通过不同的语言来进行类似的示范,通过不同的语言来进行类似的示范,通过对比性示范,从而获得类似的示范,在最后的成绩来进行类似的学习。