Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.
翻译:图表结构化数据的代表性学习最近的工作主要侧重于学习图形下层结构(如节点和子集等)的分布式图示。然而,许多图形分析任务,如图解分类和组群等,要求将整张图形作为固定长度特征矢量。虽然上述方法自然不具备学习这种表示式的设备,但图形内核仍然是获取这些表示式的最有效方法。然而,这些图形内核使用手工制作的特征(例如,最短路径、石墨等),因此受到诸如不精确的概括化等问题的阻碍。为解决这一局限性,我们提议了一个名为图形2vec的神经嵌入框架,以学习任意大小图形以数据驱动的分布式表示式。图2vec的嵌入式是以一种非超大型方式学习的,是任务。因此,这些图内核可以用于任何下游任务,例如图表分类、集群、甚至 seend 监督的代表制学习方法。我们在几个基准和大型真实世界数据集方面的实验表明,图形2vec在分类和图层图层结构上具有竞争性的方法。