Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose directed flow imbalance measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing GNN methods, and can naturally incorporate node features, unlike existing spectral methods. Experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering, for a wide range of noise and sparsity levels and graph structures and topologies.
翻译:节点集群是分析网络的有力工具。 我们引入了一个图形神经网络框架, 以自我监督的方式获取定向网络的节点嵌入, 包括新颖的概率不平衡损失, 可用于网络集群。 在这里, 我们提出与方向性密切相关的定向流量不平衡措施, 以显示网络中的集群, 即使集群之间没有密度差异。 与文献中的标准方法相反, 方向性不被视为一种干扰, 而是包含主要信号 。 DIGRAC 优化了不需要标签监督, 与现有的 GNN 方法不同, 并且可以自然地包含节点特征, 与现有的光谱方法不同。 合成数据的实验结果, 以定向随机区块模型和不同尺度的现实世界数据为形式, 表明我们基于流动不平衡, 达到定向图形集群的状态结果, 用于广泛的噪音和温度水平以及图形结构和表层。