The trends of open science have enabled several open scholarly datasets which include millions of papers and authors. Managing, exploring, and utilizing such large and complicated datasets effectively are challenging. In recent years, the knowledge graph has emerged as a universal data format for representing knowledge about heterogeneous entities and their relationships. The knowledge graph can be modeled by knowledge graph embedding methods, which represent entities and relations as embedding vectors in semantic space, then model the interactions between these embedding vectors. However, the semantic structures in the knowledge graph embedding space are not well-studied, thus knowledge graph embedding methods are usually only used for knowledge graph completion but not data representation and analysis. In this paper, we propose to analyze these semantic structures based on the well-studied word embedding space and use them to support data exploration. We also define the semantic queries, which are algebraic operations between the embedding vectors in the knowledge graph embedding space, to solve queries such as similarity and analogy between the entities on the original datasets. We then design a general framework for data exploration by semantic queries and discuss the solution to some traditional scholarly data exploration tasks. We also propose some new interesting tasks that can be solved based on the uncanny semantic structures of the embedding space.
翻译:开放科学的趋势使得数个开放的学术数据集得以建立,其中包括数以百万计的论文和作者。 有效地管理、探索和利用如此庞大和复杂的数据集具有挑战性。 近几年来, 知识图表已形成一种通用的数据格式, 用于代表关于不同实体及其关系的知识。 知识图可以以知识图嵌入方法建模, 代表实体和关系, 将矢量嵌入语义空间, 然后模拟这些嵌入矢量之间的相互作用。 然而, 知识图嵌入空间中的语义结构没有很好地研究, 因此, 知识图嵌入方法通常只用于完成知识图的完成, 而不是数据代表和分析。 在本文中, 我们提议分析这些语义结构的语义结构, 嵌入空间, 并使用它们支持数据探索。 我们还定义语义查询, 它们是在知识图嵌入空间嵌入矢量层中嵌入矢量之间的代数操作, 以解决原始数据集中实体之间的类似和类查询。 我们随后设计一个通用的框架, 数据嵌入方法用于通过精密的探索性查询, 并讨论基于传统数据的解决方案。