Classifying nodes in knowledge graphs is an important task, e.g., predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples (nodes in the knowledge graph), we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties.
翻译:知识图表中的节点分类是一项重要任务,例如,预测缺失的实体类型,预测哪些分子导致癌症,或预测哪些药物是有希望的治疗对象。虽然黑盒模型往往能取得高预测性能,但只是事后和局部解释,无法使所学模型与域知识相匹配。为此,从正反例子中学习描述逻辑概念。然而,了解这些概念往往需要很长的时间,最先进的方法为字面数据值提供了有限的支持,尽管它们对许多应用至关重要。在本文件中,我们提议EvoLearner - 学习ALCQ(D) 的进化方法,这是学习ALCQ(D) 的进化方法,是补充(ALC) 的,与合格的基点限制(Q) 和数据属性(D) 相匹配。我们为初始人口提供了一种创新的初始初始化方法:从正面的例子(知识图表中的节点)开始,我们进行有偏向的随机行走,并将其转化为描述逻辑概念。此外,我们通过在决定如何将数据初始化获得的信息最大化来支持数据获取,从而大幅区分SMLA系统模型的特性。我们展示了我们的初始化方法。