We study the graphlet sampling problem: given an integer $k \ge 3$ and a simple graph $G=(V,E)$, sample a connected induced $k$-node subgraph of $G$ (also called $k$-graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. In this work, we give the following results. (1) A near-tight bound for the classic $k$-graphlet random walk, as a function of the mixing time of $G$. In particular, ignoring $k^{O(k)}$ factors, we show that the random walk mixes in time $\tilde{\Theta}(t(G) \cdot \rho(G)^{k-1})$, where $t(G)$ is the mixing time of $G$ and $\rho(G)$ is the ratio between its maximum and minimum degree. (2) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time ${O}(n k^2 \ln k + m)$ and space $O(n)$, and a sampling phase that uses $k^{O(k)} O(\log \Delta)$ time per sample. (3) A near-optimal algorithm for $\epsilon$-uniform graphlet sampling. The preprocessing takes time $O\big(\frac{k^6}{\epsilon}\, n \log n \big)$ and space $O(n)$, and the sampling takes $k^{O(k)}O\big((\frac{1}{\epsilon})^{10}\log \frac{1}{\epsilon} \big)$ expected time per sample.
翻译:我们研究石墨取样问题 : 鉴于一个整数 $ ge 3$ 和 一个简单的图形 $ G= (V,E), 我们随机地抽样一个连接的 $k$- node 子图( 也称为 $k$- graphlet ) 。 这是一个基本的图形采矿原始, 社会网络分析和生物信息学中的应用。 在此工作中, 我们给出以下结果 。 (1) 一个接近于经典的 $ 和 $ (G) 随机行走的时间, 以 美元为混合时间。 特别是, 忽略 $k% (k) 美元 (k) 美元 (xx) 美元, 我们显示随机行走的时间混合 $\ t( g)\ 美元 美元 美元, 美元 美元 美元 和 美元 美元 。