While representation learning has yielded a great success on many graph learning tasks, there is little understanding behind the structures that are being captured by these embeddings. For example, we wonder if the topological features, such as the Triangle Count, the Degree of the node and other centrality measures are concretely encoded in the embeddings. Furthermore, we ask if the presence of these structures in the embeddings is necessary for a better performance on the downstream tasks, such as clustering and classification. To address these questions, we conduct an extensive empirical study over three classes of unsupervised graph embedding models and seven different variants of Graph Autoencoders. Our results show that five topological features: the Degree, the Local Clustering Score, the Betweenness Centrality, the Eigenvector Centrality, and Triangle Count are concretely preserved in the first layer of the graph autoencoder that employs the SUM aggregation rule, under the condition that the model preserves the second-order proximity. We supplement further evidence for the presence of these features by revealing a hierarchy in the distribution of the topological features in the embeddings of the aforementioned model. We also show that a model with such properties can outperform other models on certain downstream tasks, especially when the preserved features are relevant to the task at hand. Finally, we evaluate the suitability of our findings through a test case study related to social influence prediction.
翻译:在许多图表学习任务中,虽然代表性学习取得了巨大成功,但是这些嵌入式所捕捉的结构背后几乎没有什么了解。例如,我们想知道,嵌入式中是否具体编码了三角伯爵、节点程度和其他核心措施等地形特征。此外,我们问,这些嵌入式结构的存在是否必要,以便更好地完成下游任务,例如集群和分类。为了解决这些问题,我们对这些未受监督的图形嵌入模型和图形自动计算器的七种不同变体进行了广泛的实证研究。我们的结果显示,五个地形特征:度、地方集群评分、中间中心、电源中心、三角点等,是否具体保存在嵌入式图形自动编码的第一层,采用SUM聚合规则,条件是模型能够保持第二级的近距离。我们通过揭示在嵌入式模型的地形特征分布中的等级,来补充这些特征的存在的进一步证据。我们在上述模型的嵌入式模型中展示了五个地形特征:度、地方集群评分、中间中心、电源中心、三角点等特征,以及三角点计数是否具体保存在使用SUM组合规则的第一层中。我们最后的下游评估任务时,我们还可以评估了某个的模型。我们如何评估了其他模型。