Recent work has shown that Pre-trained Language Models (PLMs) store the relational knowledge learned from data and utilize it for performing downstream tasks. However, commonsense knowledge across different regions may vary. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs. GeoMLAMA contains 3,125 prompts in English, Chinese, Hindi, Persian, and Swahili, with a wide coverage of concepts shared by people from American, Chinese, Indian, Iranian and Kenyan cultures. We benchmark 11 standard multilingual PLMs on GeoMLAMA. Interestingly, we find that 1) larger multilingual PLMs variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) multilingual PLMs are not intrinsically biased towards knowledge from the Western countries (the United States); 3) the native language of a country may not be the best language to probe its knowledge and 4) a language may better probe knowledge about a non-native country than its native country. Code and data are released at https://github.com/WadeYin9712/GeoMLAMA.
翻译:最近的工作表明,培训前语言模型(PLMS)存储了从数据中获取的关联知识,并用于执行下游任务。然而,不同地区的常识知识可能各不相同。例如,美国婚礼中的新娘礼服颜色是白色的,中国婚礼是红色的。在本文中,我们引入了一个基准数据集,即Geo-Dioversity Commonsense 多语言模型分析(GeomMLAMAMA),用于检测多语言语言的关联知识的多样性。GeoMLAMAMA包含3,125种英语、中文、印地语、波斯语和斯瓦希里语的提示,广泛覆盖来自美国、中国、印度、伊朗和肯尼亚文化的人们共享的概念。我们在GeoMLAMA中设定了11种标准的多语言平台。有趣的是,我们发现:(1) 更大的多语言的PLMMS12变体不一定储存比其小的变体更好的地反概念;(2) 多语言的多语言模型并非对西方国家知识的内在偏见(美国);(3) 一国的土著语言可能不是调查其知识的最佳语言,4种土著语言可能更好地调查其知识。在MLA/MLA/MAin非国家的数据。