Networks are frequently used to model complex systems comprised of interacting elements. While links capture the topology of direct interactions, the true complexity of many systems originates from higher-order patterns in paths by which nodes can indirectly influence each other. Path data, representing ordered sequences of consecutive direct interactions, can be used to model these patterns. However, to avoid overfitting, such models should only consider those higher-order patterns for which the data provide sufficient statistical evidence. On the other hand, we hypothesise that network models, which capture only direct interactions, underfit higher-order patterns present in data. Consequently, both approaches are likely to misidentify influential nodes in complex networks. We contribute to this issue by proposing eight centrality measures based on MOGen, a multi-order generative model that accounts for all paths up to a maximum distance but disregards paths at higher distances. We compare MOGen-based centralities to equivalent measures for network models and path data in a prediction experiment where we aim to identify influential nodes in out-of-sample data. Our results show strong evidence supporting our hypothesis. MOGen consistently outperforms both the network model and path-based prediction. We further show that the performance difference between MOGen and the path-based approach disappears if we have sufficient observations, confirming that the error is due to overfitting.
翻译:网络经常被用来模拟由互动要素组成的复杂系统。虽然链接可以捕捉直接互动的地形,但许多系统的真正复杂性来自节点可以间接影响彼此的路径中的高阶型态。路径数据代表连续直接互动的顺序,可以用来模拟这些模式。不过,为了避免过度调整,这些模型只应考虑数据能够提供足够的统计证据的那些高阶型态。另一方面,我们假设网络模型只捕捉直接互动,不适应数据中存在的高阶型态。因此,两种方法都有可能误认复杂网络中的有影响力的节点。我们提出基于MOGen的八种核心措施,这是一个多阶型基因化模型,它记录所有通往最大距离的路径,但却忽略了更远的路径。我们把基于MOG的中央点与网络模型和路径数据的等量作比较,在这种实验中,我们的目标是找出有影响力的超标点,在基于数据中的超标。我们的结果显示强有力的证据支持我们的假设。MOGen持续超越了网络模型,如果我们能够证实网络模型和路径上的正确性预测,我们又能够证明网络模型和路径上的正确性。