Word and graph embeddings are widely used in deep learning applications. We present a data structure that captures inherent hierarchical properties from an unordered flat embedding space, particularly a sense of direction between pairs of entities. Inspired by the notion of \textit{distributional generality}, our algorithm constructs an arborescence (a directed rooted tree) by inserting nodes in descending order of entity power (e.g., word frequency), pointing each entity to the closest more powerful node as its parent. We evaluate the performance of the resulting tree structures on three tasks: hypernym relation discovery, least-common-ancestor (LCA) discovery among words, and Wikipedia page link recovery. We achieve average 8.98\% and 2.70\% for hypernym and LCA discovery across five languages and 62.76\% accuracy on directed Wiki-page link recovery, with both substantially above baselines. Finally, we investigate the effect of insertion order, the power/similarity trade-off and various power sources to optimize parent selection.
翻译:文字和图形嵌入在深层学习应用中广泛使用。 我们展示了一个数据结构, 从未排序的平板嵌入空间中捕捉固有的等级属性, 特别是两个实体之间的方向感。 在\ textit{ 分布性通用} 概念的启发下, 我们的算法以实体权力的降序( 如字数频率) 插入节点, 将每个实体指向最强大的节点作为其母体。 我们评估了由此形成的树结构在三种任务上的性能: 超同性关系发现、 字词间最小共性( LCA) 发现和维基百科页面链接恢复。 我们实现了超性和LCA在五种语言上的发现平均8.98 ⁇ 和 2.70 ⁇ 。 在定向维基- 页面连接恢复中, 我们平均达到8. 76 ⁇ 精确度, 并且两者都大大高于基线。 最后, 我们调查插入命令、 权力/ 相似性交换和各种权力来源对父母选择的最佳效果的影响。