Graph representation learning (also called graph embeddings) is a popular technique for incorporating network structure into machine learning models. Unsupervised graph embedding methods aim to capture graph structure by learning a low-dimensional vector representation (the embedding) for each node. Despite the widespread use of these embeddings for a variety of downstream transductive machine learning tasks, there is little principled analysis of the effectiveness of this approach for common tasks. In this work, we provide an empirical and theoretical analysis for the performance of a class of embeddings on the common task of pairwise community labeling. This is a binary variant of the classic community detection problem, which seeks to build a classifier to determine whether a pair of vertices participate in a community. In line with our goal of foundational understanding, we focus on a popular class of unsupervised embedding techniques that learn low rank factorizations of a vertex proximity matrix (this class includes methods like GraRep, DeepWalk, node2vec, NetMF). We perform detailed empirical analysis for community labeling over a variety of real and synthetic graphs with ground truth. In all cases we studied, the models trained from embedding features perform poorly on community labeling. In constrast, a simple logistic model with classic graph structural features handily outperforms the embedding models. For a more principled understanding, we provide a theoretical analysis for the (in)effectiveness of these embeddings in capturing the community structure. We formally prove that popular low-dimensional factorization methods either cannot produce community structure, or can only produce ``unstable" communities. These communities are inherently unstable under small perturbations.
翻译:将网络结构嵌入到机器学习模型中的流行技术是将网络结构嵌入到机器学习模型中。 未经监督的图形嵌入方法旨在通过学习每个节点的低维矢量代表( 嵌入) 来捕捉图形结构。 尽管这些嵌入在下游传输机学习任务中广泛使用这些嵌入方式, 但对于这一方法在共同任务中的有效性几乎没有原则性分析。 在这项工作中, 我们为将网络结构嵌入到对齐社区标签的共同任务中提供了经验和理论性分析。 这是经典社区检测问题的二进制变体, 试图建立分类器, 以确定一对脊椎是否参与社区。 根据我们的基本理解目标, 我们侧重于一个不受封存的嵌技术类, 学习左端接近矩阵的低阶因子化。 在这种类中, 我们为社区标记各种真实和合成的直径直社区检测问题进行详细的实验性分析。 在常规结构中, 我们用不甚精细的模型来分析。 在常规的模型中, 我们用不精细的精确的模型来分析。