Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graphs grouped into five categories including collaboration, social, citation, technological, and synthetic graphs. We provide both qualitative and quantitative results. We find that there is no single method that extracts true samples from a given graph with respect to the properties tested in this work. Our results show that the sampling algorithm that aggressively explores the neighborhood of a sampled node performs better than the others.
翻译:图样样样样样样样样样样样样图,可以挖掘一个大图的小型有代表性的子图。抽样算法采用不同的策略复制抽样图中某一图表的特性。在本研究中,我们对一个图的六种特性的五个图表抽样算法进行了全面的实证定性,包括程度、集聚系数、路径长度、全球集聚系数、分布系数和模块性。我们从十五个图表中提取样本,分为五类,包括协作、社会、引用、技术和合成图表。我们提供了定性和定量结果。我们发现,没有一种单一的方法从一个特定图表中提取关于这项工作所测试的特性的真实样品。我们的结果显示,积极探索抽样节点周围的取样算法比其他的要好。