A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph $G'$ with only $\widetilde{O}(n/\epsilon^2)$ edges such that for every cut, the weighted number of copies of $M$ crossing the cut in $G'$ is within a $1+\epsilon$ factor of the number of copies of $M$ crossing the cut in $G$, for every constant size motif $M$. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal work of to the motif setting. We adapt the importance sampling framework through the viewpoint of hypergraph sparsification by deriving the edge sampling probabilities from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in $G$ to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs.
翻译:motif 是给定方向或非方向的图形 $G$ 的经常出现的子集。 Motifs 捕获了更高层次的 $G$的组织结构, 超越边缘关系, 因此, 我们发现在图形组合、 社区检测、 生物和物理网络分析等应用中发现了一些广泛的应用。 在这些应用中, motifs 的切割结构起着关键的作用, 因为顶端被切除分割成集群, 而顶端的导线是根据某个 motif 的情况进行分解的, 而不是仅仅是边缘的数量, 跨过切割。 在此文件中, 我们引入了一个 motif 的更高层次组织结构结构结构结构, 并且我们引入了一个高层次结构结构的精度子集 G'$, 并且我们通过每个固定的深度框架 将一个低层次的精度的精度的精度的子集值 。 我们用高层次的精度结构的精度结构的精度 将一个精度的精度基的精度的精度的精度的精度的精度的精度的精度的精度结构的精度结构的精度 。