Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT's performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.
翻译:最近,人们发现单一语言英语模式可以用作知识基础,而不是结构性知识基础查询,而使用“巴黎是[MASK]的首都”等隐含的句子作为探测器。我们把既定基准TREx和GoogleRE翻译成53种语言。我们与MBERT合作,调查三个问题。 (一) mBERT能否作为一个多语言知识基础使用?MBERT?多数先前的工作只考虑英语。将研究扩大到多种语言对于多样性和可获取性很重要。 (二) mBERT作为知识基础语言依赖性或语言不同? (三) 多语言模式在更多的文字上受过培训,例如,MBERT在104维基百科中受过培训。MBERT能否利用这一点提高绩效?我们发现,将 mBERT作为知识基础产生不同语言的不同性能,并汇集各种语言的预测,可以提高绩效。相反, mBERT表现出一种语言偏见;例如,如果用意大利语进行询问,它往往预测意大利为原籍国。