We study the graphlet sampling problem: given an integer $k \ge 3$ and a simple graph $G=(V,E)$, sample a connected induced $k$-node subgraph of $G$ (also called $k$-graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. In this work, we give the following results: 1) A near-tight bound for the classic $k$-graphlet random walk, as a function of the mixing time of $G$. In particular, we show that the random walk mixes in time $\tilde{\Theta}(t(G) \cdot \rho(G)^{k-1})$, where $t(G)$ is the mixing time of $G$ and $\rho(G)$ is the ratio between its maximum and minimum degree. 2) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time ${O}(k^2 n + m)$ and space ${O}(n)$, and a sampling phase that uses $k^{{O}(k)} {O}(\log \Delta)$ time per sample. 3) A near-optimal algorithm for $\epsilon$-uniform graphlet sampling. The preprocessing takes time $k^{{O}(k)}{O}\big(\frac{1}{\epsilon}\, n \log n \big)$ and space ${O}(n)$, and the sampling takes $k^{{O}(k)}{O}\big((\frac{1}{\epsilon})^{10}\log \frac{1}{\epsilon} \big)$ expected time per sample.
翻译:我们研究石墨取样问题: 给一个整数 $ ge 3 和 一个简单的图形 $ = (V,E), 随机地抽样一个连接的 $k$- node 子图( 也称为 $k$- graphlet ) 。 这是一个基本的图形采矿原始, 在社会网络分析和生物信息学中应用了 。 在这项工作中, 我们给出以下结果 :(1) 一个接近于经典的 $ 美元 的随机行走, 作为 $ (G) 的混合时间函数 。 特别是, 我们显示一个随机行走混合的时间 $\ tilde = (V,E) 美元 美元 的 美元- node_ rode) (G) 美元 。 美元 美元 和 美元 美元 时间 。