哪些语文是生成最正式的语文模式?分析各种语文的正规形式分布</s> (In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages)

Multilingual generative language models (LMs) are increasingly fluent in a large variety of languages. Trained on the concatenation of corpora in multiple languages, they enable powerful transfer from high-resource languages to low-resource ones. However, it is still unknown what cultural biases are induced in the predictions of these models. In this work, we focus on one language property highly influenced by culture: formality. We analyze the formality distributions of XGLM and BLOOM's predictions, two popular generative multilingual language models, in 5 languages. We classify 1,200 generations per language as formal, informal, or incohesive and measure the impact of the prompt formality on the predictions. Overall, we observe a diversity of behaviors across the models and languages. For instance, XGLM generates informal text in Arabic and Bengali when conditioned with informal prompts, much more than BLOOM. In addition, even though both models are highly biased toward the formal style when prompted neutrally, we find that the models generate a significant amount of informal predictions even when prompted with formal text. We release with this work 6,000 annotated samples, paving the way for future work on the formality of generative multilingual LMs.

翻译：多语言基因语言模型(LMS)在多种语言中越来越流畅。在以多种语言对公司进行融合的培训中,以多种语言对公司进行集中化培训,使公司能够从高资源语言向低资源语言进行强有力的转移。然而,在预测这些模型时,还不清楚是什么文化偏见。在这项工作中,我们侧重于受文化严重影响的一种语言属性:形式性。我们分析了XGLM和BLOOM两种流行的多语言模型,即XGLM和BLOOM的预测的正规化分布,两种通用多语言模型,以5种语言为单位。我们将每一种语言的1 200代人分类为正式、非正式或混杂语言,并衡量迅速的正规化对预测的影响。总的来说,我们观察了不同模式和语言的行为多样性。例如,XGLM在以非正式提示为条件的情况下,在阿拉伯语和孟加拉语中生成非正式文本,远远多于BLOM。此外,尽管这两种模型在中性激励时都高度偏向正式风格偏向,但我们发现,这些模型产生了大量非正式的非正式预测,即使是在正式文本推动下,我们以6000种形式发布了关于未来工作的方式。</s>

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日