Networks are frequently used to model complex systems comprised of interacting elements. While edges capture the topology of direct interactions, the true complexity of many systems originates from higher-order patterns in paths by which nodes can indirectly influence each other. Path data, representing ordered sequences of consecutive direct interactions, can be used to model these patterns. On the one hand, to avoid overfitting, such models should only consider those higher-order patterns for which the data provide sufficient statistical evidence. On the other hand, we hypothesise that network models, which capture only direct interactions, underfit higher-order patterns present in data. Consequently, both approaches are likely to misidentify influential nodes in complex networks. We contribute to this issue by proposing five centrality measures based on MOGen, a multi-order generative model that accounts for all indirect influences up to a maximum distance but disregards influences at higher distances. We compare MOGen-based centralities to equivalent measures for network models and path data in a prediction experiment where we aim to identify influential nodes in out-of-sample data. Our results show strong evidence supporting our hypothesis. MOGen consistently outperforms both the network model and path-based prediction. We further show that the performance difference between MOGen and the path-based approach disappears if we have sufficient observations, confirming that the error is due to overfitting.
翻译:网络经常被用来模拟由互动要素组成的复杂系统。虽然边缘能捕捉直接互动的地形,但许多系统的真正复杂性来自节点可以间接影响彼此的路径中的较高顺序模式。路径数据代表连续连续直接互动的顺序,可以用来模拟这些模式。一方面,为了避免过度匹配,这些模型只应考虑数据能提供足够的统计证据的较高顺序模式。另一方面,我们假设网络模型只捕捉直接互动,而数据中存在较高顺序模式。因此,这两种方法都有可能误认复杂网络中的有影响的节点。我们提出基于MOGen的五种核心措施,这是一个多顺序基因化模型,将所有间接影响都推到最大距离,但无视更远的影响。我们把基于MOGen的中央模型与对网络模型和路径数据的同等措施作比较。在一次预测实验中,我们的目标是找出有影响力的超标点数据。因此,我们的结果都显示强有力的证据,支持我们的假设。我们提出基于MOGen的多级节点措施,即基于MOG的多级组合模式,我们又能够证实业绩模型的路径。