Knowledge graph (KG) embedding techniques use structured relationships between entities to learn low-dimensional representations of entities and relations. The traditional KG embedding techniques (such as TransE and DistMult) estimate these embeddings via simple models developed over observed KG triplets. These approaches differ in their triplet scoring loss functions. As these models only use the observed triplets to estimate the embeddings, they are prone to suffer through data sparsity that usually occurs in the real-world knowledge graphs, i.e., the lack of enough triplets per entity. To settle this issue, we propose an efficient method to augment the number of triplets to address the problem of data sparsity. We use random walks to create additional triplets, such that the relations carried by these introduced triplets entail the metapath induced by the random walks. We also provide approaches to accurately and efficiently filter out informative metapaths from the possible set of metapaths, induced by the random walks. The proposed approaches are model-agnostic, and the augmented training dataset can be used with any KG embedding approach out of the box. Experimental results obtained on the benchmark datasets show the advantages of the proposed approach.
翻译:嵌入知识图( KG) 嵌入技术使用实体之间的结构化关系来学习实体和关系中的低维表示。 传统的 KG 嵌入技术( 如 TransE 和 DistMult) 通过在观测到的KG 三角上开发的简单模型来估计这些嵌入。 这些方法在三重评分损失功能上各不相同。 由于这些模型只使用所观察到的三重评分来估计嵌入, 它们很容易受到通常发生在真实世界知识图中的数据宽度的影响, 即每个实体缺少足够的三重体。 为了解决这个问题, 我们建议了一种有效的方法来增加三重体数, 以解决数据宽度问题。 我们用随机行走来创造更多的三重体。 因此, 这些引入的三重选取关系包含随机行走所引出的元路径。 我们还提供了准确和高效地从可能设置的元路径中过滤出信息性的元路径的方法, 即: 每个实体缺少足够的三重数据。 为了解决这个问题, 我们提出了一种有效的方法来增加培训数据集, 可以用任何KG 嵌入式方法来显示框点的优势。