Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google Knowledge Graph and Facebook social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets mining patterns in the single-graph setting. We resolve the "DCP-intuitiveness" dilemma by shifting the mining target from frequent subgraphs to frequent neighborhoods. A neighborhood is a specific topological pattern where a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant semantics as subgraph patterns. Experiments on real-life datasets display the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered in prior works.
翻译:多年来, 频繁的子图一直是模式采矿文献中一种重要的目标模式, 大多数工作都与持有若干图表交易的数据库打交道, 例如化合物的化学结构。 这些方法在很大程度上依赖于支持措施的向下关闭属性( DCP ), 以确保对候选模式进行有效的裁剪。 当转换到谷歌知识图和脸书社会图等单一图形数据库的新兴情景时, 传统的支持措施显得微不足道( 0 或 1 ) 。 然而, 据我们所知, 所有试图重新定义单一图表支持的尝试都导致一些措施, 要么失去 DCP, 要么不再具有直观性。 本文针对的是单一绘图设置中的采矿模式。 我们通过将采矿目标从经常的子图谱转换到频繁的邻居, 解决了“ DCP 直观性” 的困境。 邻里是一个特定的表层模式, 并且如果被大量( 高于某一阈值), 则经常出现这种模式。 我们显示, 新的模式不仅维持了我们以往的图像模式, 并且作为历史模型中的重要数据展示了我们之前的图层。