A variety of tasks on dynamic graphs, including anomaly detection, community detection, compression, and graph understanding, have been formulated as problems of identifying constituent (near) bi-cliques (i.e., complete bipartite graphs). Even when we restrict our attention to maximal ones, there can be exponentially many near bi-cliques, and thus finding all of them is practically impossible for large graphs. Then, two questions naturally arise: (Q1) What is a "good" set of near bi-cliques? That is, given a set of near bi-cliques in the input dynamic graph, how should we evaluate its quality? (Q2) Given a large dynamic graph, how can we rapidly identify a high-quality set of near bi-cliques in it? Regarding Q1, we measure how concisely, precisely, and exhaustively a given set of near bi-cliques describes the input dynamic graph. We combine these three perspectives systematically on the Minimum Description Length principle. Regarding Q2, we propose CutNPeel, a fast search algorithm for a high-quality set of near bi-cliques. By adaptively re-partitioning the input graph, CutNPeel reduces the search space and at the same time improves the search quality. Our experiments using six real-world dynamic graphs demonstrate that CutNPeel is (a) High-quality: providing near bi-cliques of up to 51.2% better quality than its state-of-the-art competitors, (b) Fast: up to 68.8x faster than the next-best competitor, and (c) Scalable: scaling to graphs with 134 million edges. We also show successful applications of CutNPeel to graph compression and pattern discovery.
翻译:动态图形上的各种任务, 包括异常检测、 社区检测、 压缩、 图形理解, 被设计成在识别元素( 近点) 双层( 完整的双层图) 上的问题。 即使我们把注意力限制在最大点上, 也可能有很多接近双层的指数性任务, 从而找到所有这些任务对于大图来说几乎是不可能的 。 然后自然产生两个问题 : ( Q1 ) 哪些是接近双层的“ 良好”? 也就是说, 在输入动态图形中, 一组接近双层的“ 良好 ”, 我们应该如何评估其质量? ( Q2 ), 在大型动态图表中, 我们如何快速识别高质量的双层( 完整)? 关于 Q1, 我们用简洁、 精确和完整的方式测量输入输入输入 。 我们将这些三个观点系统地结合到最小度的 。 关于 Q2, 我们提议 CutNPePeal, 一个快速搜索算法, 用于近点的高质量 近点 质量? ( Q2 ), 如何快速地辨测算: 上, 快速地, 直位的直位( 平级的平级的平级搜索, 我们的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级的平级 ) 。