As social network analysis (SNA) has drawn much attention in recent years, one bottleneck of SNA is these network data are too massive to handle. Furthermore, some network data are not accessible due to privacy problems. Therefore, we have to develop sampling methods to draw representative sample graphs from the population graph. In this paper, Metropolis-Hastings Random Walk (MHRW) and Random Walk with Jumps (RWwJ) sampling strategies are introduced, including the procedure of collecting nodes, the underlying mathematical theory, and corresponding estimators. We compared our methods and existing research outcomes and found that MHRW performs better when estimating degree distribution (61% less error than RWwJ) and graph order (0.69% less error than RWwJ), while RWwJ estimates follower and following ratio average and mutual relationship proportion in adjacent relationship with better results, with 13% less error and 6% less error than MHRW. We analyze the reasons for the outcomes and give possible future work directions.
翻译:由于社会网络分析(SNA)近年来引起了人们的极大注意,SNA的一个瓶颈是这些网络数据过于庞大,无法处理。此外,由于隐私问题,一些网络数据无法获取。因此,我们必须开发抽样方法,从人口图中提取具有代表性的样本图。在本论文中,采用了Metropolis-Hastings Rangle Work(MHRW)和RWwJ(RWwJ)随机漫游(RWwJ)抽样战略,包括收集节点的程序、基本数学理论和相应的估计者。我们比较了我们的方法和现有研究结果,发现MHRW在估计学位分布(比RWwJ少61%差)和图表顺序(比RWwJ少0.69%差)时表现更好,而RWwJ估计了在相邻关系中遵循平均和相互关系比例,结果更好,差13%,差少6%。我们分析了结果的原因,并给出了未来可能的工作方向。