Most existing knowledge graphs (KGs) in academic domains suffer from problems of insufficient multi-relational information, name ambiguity and improper data format for large-scale machine processing. In this paper, we present AceKG, a new large-scale KG in academic domain. AceKG not only provides clean academic information, but also offers a large-scale benchmark dataset for researchers to conduct challenging data mining projects including link prediction, community detection and scholar classification. Specifically, AceKG describes 3.13 billion triples of academic facts based on a consistent ontology, including necessary properties of papers, authors, fields of study, venues and institutes, as well as the relations among them. To enrich the proposed knowledge graph, we also perform entity alignment with existing databases and rule-based inference. Based on AceKG, we conduct experiments of three typical academic data mining tasks and evaluate several state-of- the-art knowledge embedding and network representation learning approaches on the benchmark datasets built from AceKG. Finally, we discuss several promising research directions that benefit from AceKG.
翻译:学术领域现有的大多数知识图表(KGs)都存在多种关系信息不足、名称模糊和大规模机器处理数据格式不当等问题。本文介绍AceKG,这是学术领域新的大规模KG。AceKG不仅提供清洁的学术信息,而且还提供大规模基准数据集,供研究人员开展具有挑战性的数据采矿项目,包括连接预测、社区探测和学术分类。具体地说,AceKG根据一贯的理论,描述了31.3亿个学术事实,包括论文作者、作者、研究领域、地点和研究所的必要性质以及它们之间的关系。为了丰富拟议的知识图表,我们还与现有数据库和基于规则的推断进行实体调整。基于AceKG,我们实验了三项典型的学术数据采矿任务,并评估了从AceKG建立的基准数据集中吸收和网络代表的若干最新知识。最后,我们讨论了从AceKG获益的几项有希望的研究方向。