通过有效取样高效率地查找一个最大 clique 概要 (Efficiently Finding a Maximal Clique Summary via Effective Sampling)

Maximal clique enumeration (MCE) is a fundamental problem in graph theory and is used in many applications, such as social network analysis, bioinformatics, intelligent agent systems, cyber security, etc. Most existing MCE algorithms focus on improving the efficiency rather than reducing the output size. The output unfortunately could consist of a large number of maximal cliques. In this paper, we study how to report a summary of less overlapping maximal cliques. The problem was studied before, however, after examining the pioneer approach, we consider it still not satisfactory. To advance the research along this line, our paper attempts to make four contributions: (a) we propose a more effective sampling strategy, which produces a much smaller summary but still ensures that the summary can somehow witness all the maximal cliques and the expectation of each maximal clique witnessed by the summary is above a predefined threshold; (b) we prove that the sampling strategy is optimal under certain optimality conditions; (c) we apply clique-size bounding and design new enumeration order to approach the optimality conditions; and (d) to verify experimentally, we test eight real benchmark datasets that have a variety of graph characteristics. The results show that our new sampling strategy consistently outperforms the state-of-the-art approach by producing smaller summaries and running faster on all the datasets.

翻译：最大分层计数(MCE)是图表理论中的一个根本问题,许多应用,如社会网络分析、生物信息学、智能代理系统、网络安全等,都使用这种计数法(MCE),这在图表理论中是一个根本性问题,许多应用都使用这种计数法,例如社会网络分析、生物信息学、智能代理系统、网络安全等。多数现有的MCE算法侧重于提高效率,而不是降低产出大小。不幸的是,产出可能包含大量的最大分层。在本文件中,我们研究如何报告一个不那么重叠的最大分层的汇总。然而,在审查先驱方法之后,我们以前曾研究过这一问题,我们认为这一问题仍然不令人满意。为了推进这一思路的研究,我们的文件试图作出四项贡献:(a) 我们提出一个更加有效的抽样战略,它产生一个小得多的摘要,但是仍然确保摘要能够以某种方式见证所有最大分层,而摘要所见证的每种最大分层的预期值都高于预先确定的阈值。 (b) 我们证明抽样战略在某些最佳条件下是最佳的;(c)我们应用结尺寸的捆绑和设计新的分查方法,以接近最佳条件;以及(d)核查各种新分式的图表,我们用较小的图表来检验。