With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (e.g., chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar or homogenous topologies, which is undesirable in many graph applications. To address this limitation, we propose the Top-k Edge-Diversified Patterns Discovery problem to retrieve a set of subgraphs that cover the maximum number of edges in a database. To efficiently process such query, we present a generic and extensible framework called Ted which achieves a guaranteed approximation ratio to the optimal result. Two optimization strategies are further developed to improve the performance. Experimental studies on real-world datasets demonstrate the superiority of Ted to traditional techniques.
翻译:由于来自不同储存库的图表数量成倍增加,因此非常需要分析一个包含广泛收集中小数据图表的图表数据库(例如,化学化合物),虽然提议进行子图查点和子图开采,以便通过一组子图结构将洞察力带入一个图表数据库,但最终往往出现类似的或同质的地形,在许多图形应用中这是不可取的。为解决这一局限性,我们提议采用“顶层边缘发现模式”来检索一套覆盖数据库中最大边缘数的子图。为了有效处理这种查询,我们提出了一个称为特德的通用和可扩展框架,它能保证接近最佳结果的比率。为了改进性能,将进一步制定两种优化战略。关于现实世界数据集的实验研究显示特德对传统技术的优越性。