Random walk sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as MHRW design weighted walking by repeating low-degree nodes while rejecting high-degree nodes, so that the long-term behavior of Markov chain can achieve uniform distribution. This modification, however, may make the sampler stay in the same node for several times, leading to undersampling. To address this issue, we propose a sampling framework that only need current and candidate node degree to improve the performance of graph sampling methods. We also extend our original idea to a more general framework. Our extended IDRW method finds a balance between the large deviation problem of SRW and sample rejection problem in MHRW. We evaluate our technique in simulation by running extensive experiments on various real-world datasets, and the result show that our method improves the accuracy compared with the state of art techniques. We also investigate the effect of the parameter and give the suggested range for a better usage in application.
翻译:近些年来,在图表抽样中广泛使用随机步行抽样方法,但这种抽样方法偏向于抽样中较高的节点。为了克服这一缺陷,传统方法,如MHRW设计,通过重复低度节点来加权行走,同时拒绝高度节点,这样Markov链的长期行为可以实现统一分布。但是,这种修改可能会使取样器在同一个节点中停留好几次,从而导致抽样调查不足。为了解决这一问题,我们提议了一个仅需要当前和候选节点的抽样框架来改进图表抽样方法的性能。我们还将我们原先的想法扩大到一个更普遍的框架。我们扩大的IDRW方法在SRW的大偏差问题和MHRW的样本拒绝问题之间找到平衡点。我们通过在各种真实世界数据集上进行广泛的实验来评估我们的模拟技术,结果显示我们的方法比艺术技术的状态提高了准确性。我们还调查参数的效果,并给出更好的应用范围。