Instability of trained models, i.e., the dependence of individual node predictions on random factors, can affect reproducibility, reliability, and trust in machine learning systems. In this paper, we systematically assess the prediction instability of node classification with state-of-the-art Graph Neural Networks (GNNs). With our experiments, we establish that multiple instantiations of popular GNN models trained on the same data with the same model hyperparameters result in almost identical aggregated performance but display substantial disagreement in the predictions for individual nodes. We find that up to one third of the incorrectly classified nodes differ across algorithm runs. We identify correlations between hyperparameters, node properties, and the size of the training set with the stability of predictions. In general, maximizing model performance implicitly also reduces model instability.
翻译:训练有素模型的不稳定性,即个人节点预测对随机因素的依赖性,可能会影响机器学习系统的可复制性、可靠性和信任性。在本文中,我们系统地评估了与最先进的图形神经网络(GNN)对节点分类的预测不稳定性。通过我们的实验,我们确定以同一数据为同一数据而培训的通用GNN模型的多重即时性与同一模型超参数的相同性能几乎完全相同,但在对单个节点的预测中却表现出很大的分歧。我们发现,高达三分之一的错误分类节点在算法运行中存在差异。我们确定了超参数、节点特性和成套培训的规模与预测的稳定性之间的相互关系。一般来说,最大化模型性能也意味着减少模型的不稳定性。