The increasing amounts of semantic resources offer valuable storage of human knowledge; however, the probability of wrong entries increases with the increased size. The development of approaches that identify potentially spurious parts of a given knowledge base is thus becoming an increasingly important area of interest. In this work, we present a systematic evaluation of whether structure-only link analysis methods can already offer a scalable means to detecting possible anomalies, as well as potentially interesting novel relation candidates. Evaluating thirteen methods on eight different semantic resources, including Gene Ontology, Food Ontology, Marine Ontology and similar, we demonstrated that structure-only link analysis could offer scalable anomaly detection for a subset of the data sets. Further, we demonstrated that by considering symbolic node embedding, explanations of the predictions (links) could be obtained, making this branch of methods potentially more valuable than the black-box only ones. To our knowledge, this is currently one of the most extensive systematic studies of the applicability of different types of link analysis methods across semantic resources from different domains.
翻译:越来越多的语义资源为人类知识提供了宝贵的储存;然而,随着规模的扩大,输入错误的可能性会增加。制定办法,查明特定知识库中可能虚假的部分,因此正在成为一个越来越重要的关注领域。在这项工作中,我们提出系统评价,说明只有结构的链接分析方法是否已经为发现可能的异常现象提供了可伸缩的手段,以及可能令人感兴趣的新颖关系候选人。对包括基因本体学、粮食本体学、海洋本体学等八种不同语义资源的十三种方法进行评估后,我们表明,只有结构的链接分析可以为一组数据集提供可缩放的异常现象检测。此外,我们还表明,通过考虑象征性节点嵌入,可以对预测(链接)作出解释,使这一组方法有可能比黑箱中唯一的方法更有价值。据我们所知,这是目前对不同领域的语义资源不同类型链接分析方法的可适用性进行最广泛的系统研究之一。