Node clustering is a powerful tool in the analysis of networks. Here, we introduce a graph neural network framework with a novel scalable Directed Mixed Path Aggregation(DIMPA) scheme to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss. The method is end-to-end in combining embedding generation and clustering without an intermediate step. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. In particular, we leverage the recently introduced cut flow imbalance measure, which is tightly related to directionality; cut flow imbalance is optimized without resorting to spectral methods or cluster labels. Experimental results on synthetic data, in the form of directed stochastic block models and real-world data at different scales, demonstrate that our method attains state-of-the-art results on directed clustering, for a wide range of noise and sparsity levels, as well as graph structures.
翻译:节点集聚是分析网络的有力工具。 在这里, 我们引入了一个图形神经网络框架, 配有一种新的可缩放的定向混合路径集聚( DIMPA) 计划, 以自我监督的方式获得定向网络的节点嵌入, 包括新的概率失衡损失。 这种方法是结合嵌入生成和集聚而不采取中间步骤的端对端方法。 与文献中的标准方法相反, 方向性不被视为一种干扰, 而是包含主要信号。 特别是, 我们利用了最近引入的与方向性密切相关的削减流量不平衡措施; 削减流量不平衡在不使用光谱方法或集束标签的情况下得到优化。 合成数据的实验结果, 其形式是定向随机区块模型和不同尺度的真实世界数据。 实验结果表明, 我们的方法在定向集集聚上取得了最先进的结果, 包括广泛的噪音和紧张程度, 以及图形结构。