Graph neural networks (GNNs) have achieved strong performance in various applications. In the real world, network data is usually formed in a streaming fashion. The distributions of patterns that refer to neighborhood information of nodes may shift over time. The GNN model needs to learn the new patterns that cannot yet be captured. But learning incrementally leads to the catastrophic forgetting problem that historical knowledge is overwritten by newly learned knowledge. Therefore, it is important to train GNN model to learn new patterns and maintain existing patterns simultaneously, which few works focus on. In this paper, we propose a streaming GNN model based on continual learning so that the model is trained incrementally and up-to-date node representations can be obtained at each time step. Firstly, we design an approximation algorithm to detect new coming patterns efficiently based on information propagation. Secondly, we combine two perspectives of data replaying and model regularization for existing pattern consolidation. Specially, a hierarchy-importance sampling strategy for nodes is designed and a weighted regularization term for GNN parameters is derived, achieving greater stability and generalization of knowledge consolidation. Our model is evaluated on real and synthetic data sets and compared with multiple baselines. The results of node classification prove that our model can efficiently update model parameters and achieve comparable performance to model retraining. In addition, we also conduct a case study on the synthetic data, and carry out some specific analysis for each part of our model, illustrating its ability to learn new knowledge and maintain existing knowledge from different perspectives.
翻译:在现实世界中,网络数据通常以不断学习的方式形成。 指结节点周围信息的模式分布可能随时间变化。 GNN模式需要学习尚未捕捉的新模式。 但是,逐渐学习导致灾难性的忘记问题,历史知识被新学知识所取代。 因此, 有必要培训GNN模式, 学习新模式并同时保持现有模式, 而这些模式很少能发挥作用。 在本文中, 我们提议以不断学习为基础, 流出GNN模式模式, 以便该模式得到渐进式和最新节点表达方式的训练, 每一步都可以获得该模式的分布。 首先, 我们设计近似算算法, 以便根据信息传播有效探测新的模式; 第二, 我们将数据重放和模型规范化两个观点结合起来, 现有模式模型的分级- 重要性抽样战略正在设计, 全球NNN参数的加权化术语正在产生, 实现更大的稳定性和总体化知识整合。 我们的模型在真实和合成数据集上进行了评估, 并且与多种合成基线对比了我们的业绩分析。 我们的每个模型都能够有效地进行案例研究, 实现我们不同的分析。