Large language models appear to learn facts from the large text corpora they are trained on. Such facts are encoded implicitly within their many parameters, making it difficult to verify or manipulate what knowledge has been learned. Language models have recently been extended to multilingual language models (MLLMs), enabling knowledge to be learned across hundreds of languages. Meanwhile, knowledge graphs contain facts in an explicit triple format, which require careful and costly curation and are only available in a few high-resource languages, restricting their research and application. To address these issues, we propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages, including low-resource ones. Specifically, we introduce a lightweight adapter set to enhance MLLMs with cross-lingual entity alignment and facts from MLKGs for many languages. Experiments on common benchmarks show that such enhancement benefits both MLLMs and MLKGs, achieving: (1) comparable or improved performance for knowledge graph completion and entity alignment relative to baselines, especially for low-resource languages (for which knowledge graphs are unavailable); and (2) improved MLLM performance on language understanding tasks that require multilingual factual knowledge; all while maintaining performance on other general language tasks.
翻译:大型语言模型似乎可以从它们所培训的大型文本公司中了解事实。这些事实在它们的许多参数中被暗含地编码,难以核实或掌握所学的知识。语言模型最近被扩大到多语种模式(MLLM),能够以数百种语言学习知识。与此同时,知识图表以明确的三重格式包含事实,需要仔细和昂贵的翻译,并且只能以少数高资源语言提供,限制了它们的研究和应用。为了解决这些问题,我们提议利用多语种知识图表(MLKGs)的知识加强MLLMS, 以便处理包括低资源语言在内的多种语言的语言和知识图表任务。具体地说,我们引入了一个轻量的调整器,用多种语言的跨语言实体调整和MLKGs的事实来增强MLLMs。关于共同基准的实验表明,这种增强既有利于MLLM和MKGs, 也有利于实现:(1) 知识图表的完成和实体与基线的比对等业绩,特别是低资源语言(没有知识图表);以及(2) 改进MLLLM的成绩,同时需要掌握其他语言的一般工作。