序列分层再生:大型国家空间MCMC,应用于子计数估算 (Sequential Stratified Regeneration: MCMC for Large State Spaces with an Application to Subgraph Count Estimation)

This work considers the general task of estimating the sum of a bounded function over the edges of a graph, given neighborhood query access and where access to the entire network is prohibitively expensive. To estimate this sum, prior work proposes Markov chain Monte Carlo (MCMC) methods that use random walks started at some seed vertex and whose equilibrium distribution is the uniform distribution over all edges, eliminating the need to iterate over all edges. Unfortunately, these existing estimators are not scalable to massive real-world graphs. In this paper, we introduce Ripple, an MCMC-based estimator that achieves unprecedented scalability by stratifying the Markov chain state space into ordered strata with a new technique that we denote {\em sequential stratified regenerations}. We show that the Ripple estimator is consistent, highly parallelizable, and scales well. We empirically evaluate our method by applying Ripple to the task of estimating connected, induced subgraph counts given some input graph. Therein, we demonstrate that Ripple is accurate and can estimate counts of up to $12$-node subgraphs, which is a task at a scale that has been considered unreachable, not only by prior MCMC-based methods but also by other sampling approaches. For instance, in this target application, we present results in which the Markov chain state space is as large as $10^{43}$, for which Ripple computes estimates in less than $4$ hours, on average.

翻译：这项工作考虑了在图形边缘估计一个捆绑函数的总和的一般任务, 以邻里查询访问为条件, 并且进入整个网络的费用非常昂贵。为了估算这个总和, 先前的工作提议了使用随机散步的Markov连锁 Monte Carlo( MCMC ) 方法, 开始于某个种子顶端, 其平衡分布是所有边缘的统一分布, 消除了在所有边缘的循环需要。不幸的是, 这些现有的估计器无法对巨大的真实世界图形进行缩放。在本文中, 我们引入了 Ripple, 一个基于 MCM 的估测仪, 通过将 Markov 链状态空间分解到定位层, 实现前所未有的伸缩性, 实现前所未有的缩放。我们用一种新的技术来显示 Riple 估测器是一致的, 消除所有边缘的偏移, 消除所有边缘的偏移的必要性, 消除了所有边缘的偏移。不幸的是, 我们用Rippe 来评估我们的方法, 将 Rypple 用于估算相连接的$ 、引算值计数图表。在本文中, 我们用Rippple 的估测图中, 估计为不为12美元, 例中, 平比平面平面平平的排序方法是。