Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework, named DIGRAC, to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose \textit{directed flow imbalance} measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing graph neural network methods, and can naturally incorporate node features, unlike existing spectral methods. Extensive experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering when compared against 10 state-of-the-art methods from the literature, for a wide range of noise and sparsity levels, graph structures, and topologies, and even outperforms supervised methods.
翻译:节点组合是分析网络的有力工具。 我们引入了一个名为 DIGRAC 的图形神经网络框架, 以自我监督的方式获得定向网络的节点嵌入, 包括新颖的概率不平衡损失, 可用于网络集群。 这里, 我们提议了与方向性密切相关的\ textit{ direct 流不平衡度度测量方法, 以显示网络中的集群, 即使各组之间没有密度差异。 与文献中的标准方法相反, 本文将方向性不视为一种干扰, 而是包含主要信号。 DIGRAC 优化了不需要标签监督的组合的定向流动不平衡, 与现有的图形神经网络方法不同, 并且可以自然地包含节点特征。 合成数据的广泛实验结果, 以定向分流模型模型模式的形式, 以及不同规模的现实世界数据, 表明我们的方法, 与流量不平衡相比, 并不被视为一种干扰, 而是包含主要信号。 DIGRAC 与现有的图形结构相比, 与现有的图形系统、 和监管性水平 相比, 高层次 的图像 和高层次 的文献 范围 。