The Internet is composed of networks, called Autonomous Systems (or, ASes), interconnected to each other, thus forming a large graph. While both the AS-graph is known and there is a multitude of data available for the ASes (i.e., node attributes), the research on applying graph machine learning (ML) methods on Internet data has not attracted a lot of attention. In this work, we provide a benchmarking framework aiming to facilitate research on Internet data using graph-ML and graph neural network (GNN) methods. Specifically, we compile a dataset with heterogeneous node/AS attributes by collecting data from multiple online sources, and preprocessing them so that they can be easily used as input in GNN architectures. Then, we create a framework/pipeline for applying GNNs on the compiled data. For a set of tasks, we perform a benchmarking of different GNN models (as well as, non-GNN ML models) to test their efficiency; our results can serve as a common baseline for future research and provide initial insights for the application of GNNs on Internet data.
翻译:互联网由网络组成,称为自动化系统(或ASes),相互连接,从而形成一个大图表。虽然AS系统是已知的,而且ASes(即节点属性)拥有大量数据,但是在互联网数据中应用图形机学习方法的研究没有引起多少注意。在这项工作中,我们提供了一个基准框架,目的是利用图形-ML和图形神经网络(GNN)方法促进互联网数据的研究。具体地说,我们通过从多个在线来源收集数据,编集一个带有不同节点/AS属性的数据集,并对其进行预处理,以便很容易地将其用作GNN结构的输入。然后,我们为在汇编的数据中应用GNNS建立了一个框架/管道。对于一套任务,我们对不同的GNN模型(以及非GNNML模型)进行基准,以测试其效率;我们的结果可以作为未来研究的共同基线,并为GNNP在互联网数据上的应用提供初步的见解。