We study the following problem: given an integer $k \ge 3$ and a simple graph $G$, sample a connected induced $k$-node subgraph of $G$ uniformly at random. This is a fundamental graph mining primitive with applications in social network analysis, bioinformatics, and more. Surprisingly, no efficient algorithm is known for uniform sampling; the only somewhat efficient algorithms available yield samples that are only approximately uniform, with running times that are unclear or suboptimal. In this work we provide: (i) a near-optimal mixing time bound for a well-known random walk technique, (ii) the first efficient algorithm for truly uniform graphlet sampling, and (iii) the first sublinear-time algorithm for $\epsilon$-uniform graphlet sampling.
翻译:我们研究以下问题:考虑到一个整数 $k ge 3$ 和一个简单的图表 $G$,我们随机抽样一个相连的诱导 $k$-node 子集, 平均为$G$。 这是一个基本的图形开采原始, 其应用包括社会网络分析、 生物信息学等。 令人惊讶的是, 统一取样时并不知道有效的算法; 仅有的比较有效的算法, 其运行时间不明确或不理想。 在这项工作中, 我们提供了:(一) 一个接近最佳的混合时间, 用于一种众所周知的随机步行技术, (二) 真正统一的图形取样的第一个高效算法, 以及 (三) 用于 $\ ebsilon$- unformat piglet 取样的第一个子线性算法。