Data-driven analysis of complex networks has been in the focus of research for decades. An important area of research is to study how well real networks can be described with a small selection of metrics, furthermore how well network models can capture the relations between graph metrics observed in real networks. In this paper, we apply machine learning techniques to investigate the aforementioned problems. We study 500 real-world networks along with 2,000 synthetic networks generated by four frequently used network models with previously calibrated parameters to make the generated graphs as similar to the real networks as possible. This paper unifies several branches of data-driven complex network analysis, such as the study of graph metrics and their pair-wise relationships, network similarity estimation, model calibration, and graph classification. We find that the correlation profiles of the structural measures significantly differ across network domains and the domain can be efficiently determined using a small selection of graph metrics. The structural properties of the network models with fixed parameters are robust enough to perform parameter calibration. The goodness-of-fit of the network models highly depends on the network domain. By solving classification problems, we find that the models lack the capability of generating a graph with a high clustering coefficient and relatively large diameter simultaneously. On the other hand, models are able to capture exactly the degree-distribution-related metrics.
翻译:数十年来,对复杂网络的数据驱动分析一直是研究的焦点。一个重要的研究领域是研究如何用少量的量度来描述真正的网络,以及网络模型如何能捕捉在真实网络中观察到的图形度量仪之间的关系。在本文中,我们运用机器学习技术来调查上述问题。我们研究了500个真实世界网络以及由4个经常使用的网络模型和2 000个合成网络生成的合成网络,这些网络模型有以前经过校准的参数,使得生成的图表尽可能与真实网络相类似。本文将数据驱动的复杂网络分析的若干分支统一起来,例如对图形度量度及其对称关系的研究、网络相似性估计、模型校准和图表分类。我们发现,结构计量的关联性在网络领域和领域之间有很大差异。我们发现,利用少量的量度量度来有效确定结构度。具有固定参数的网络模型的结构特性足以进行参数校准。网络模型的完善性在很大程度上取决于网络域域。我们通过分类问题发现,模型缺乏生成具有精确集量度和相对大直径的模型的能力。