Software module clustering is an unsupervised learning method used to cluster software entities (e.g., classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the software entities' structure and behavior. Implementing software module clustering with optimal results is challenging. Accordingly, researchers have addressed many aspects of software module clustering in the past decade. Thus, it is essential to present the research evidence that has been published in this area. In this study, 143 research papers from well-known literature databases that examined software module clustering were reviewed to extract useful data. The obtained data were then used to answer several research questions regarding state-of-the-art clustering approaches, applications of clustering in software engineering, clustering processes, clustering algorithms, and evaluation methods. Several research gaps and challenges in software module clustering are discussed in this paper to provide a useful reference for researchers in this field.
翻译:软件模块群集是一种未经监督的学习方法,用于具有类似特征的集群软件实体(如类、模块或文档),获得的集群可用于研究、分析和理解软件实体的结构和行为。实施软件模块群集,并取得最佳结果,具有挑战性。因此,研究人员在过去十年中处理了软件模块群集的许多方面。因此,有必要提出该领域已经公布的研究证据。在本研究报告中,审查了对软件模块群集进行审查的著名文献数据库的143份研究论文,以提取有用的数据。随后,获得的数据被用于回答关于最新集群方法、在软件工程、集群程序、组合算法和评价方法中应用集群的若干研究问题。本文讨论了软件模块群集的若干研究差距和挑战,以便为这一领域的研究人员提供有用的参考。