Representation learning for networks provides a new way to mine graphs. Although current researches in this area are able to generate reliable results of node embeddings, they are still limited to homogeneous networks in which all nodes and edges are of the same type. While, increasingly, graphs are heterogeneous with multiple node- and edge- types in the real world. Existing heterogeneous embedding methods are mostly task-based or only able to deal with limited types of node & edge. To tackle this challenge, in this paper, an edge2vec model is proposed to represent nodes in ways that incorporate edge semantics represented as different edge-types in heterogeneous networks. An edge-type transition matrix is optimized from an Expectation-Maximization (EM) framework as an extra criterion of a biased node random walk on networks, and a biased skip-gram model is leveraged to learn node embeddings based on the random walks afterwards. edge2vec is validated and evaluated using three medical domain problems on an ensemble of complex medical networks (more than 10 node- \& edge- types): medical entity classification, compound-gene binding prediction, and medical information searching cost. Results show that by considering edge semantics, edge2vec significantly outperforms other state-of-art models on all three tasks.
翻译:网络代表制学习提供了一种新方式的矿点图。 尽管目前这一领域的研究能够产生节点嵌入的可靠结果, 但这些研究仍然局限于所有节点和边缘都属于同一类型的同质网络。 虽然图表日益多样化, 现实世界中多节点和边缘类型。 现有的多样化嵌入方法大多基于任务, 或只能处理有限的节点和边缘。 为应对这一挑战, 本文建议了一个边缘2vec模型, 以将边缘语义作为不同边缘类型在混合网络中体现的方式代表节点。 边缘型过渡矩阵从期待- 最大度( EM) 框架中优化, 成为网络上偏差节点随机行走的附加标准, 并且利用偏差跳线模型学习基于以后随机行走的节点嵌入。 边缘2vec在复杂的医疗网络组合( 超过 10 NOde- ⁇ 边缘型) 的三个医学领域问题中得到验证和评估: 边缘型医学实体分类、 复合类型 边缘2 、 边缘型号( 考虑其他边缘型) 医学模型, 显示其他边缘型号的医学模型。