Investigating whether pre-trained language models (LMs) can function as knowledge bases (KBs) has raised wide research interests recently. However, existing works focus on simple, triple-based, relational KBs, but omit more sophisticated, logic-based, conceptualised KBs such as OWL ontologies. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tasks and datasets from ontology subsumption axioms involving both atomic and complex concepts. We conduct extensive experiments on ontologies of different domains and scales, and our results demonstrate that LMs encode relatively less background knowledge of Subsumption Inference (SI) than traditional Natural Language Inference (NLI) but can improve on SI significantly when a small number of samples are given. We will open-source our code and datasets.
翻译:研究预训练语言模型(LMs)是否能够扮演知识库(KBs)角色,最近引起了广泛关注。 然而,现有的研究主要集中在简单的、基于三元组的关系型KBs上,而忽略了更复杂的、基于逻辑的、概念化的KBs,例如OWL本体。为了研究LM对本体的知识,我们提出了OntoLAMA,一组涉及原子和复杂概念的子类推断公理的推断任务和数据集。我们在不同领域和规模的本体上进行了大量实验,结果表明,相对于传统的自然语言推断(NLI),LM编码了相对较少的子类推断(SI)的背景知识,但在给定少量样本的情况下,可以显著改进SI。我们将公开我们的代码和数据集。