Component Based Software Engineering (CBSE) seeks to promote the reuse of software by using existing software modules into the development process. However, the availability of such a reusable component is not immediate and is costly and time consuming. As an alternative, the extraction from pre-existing OO software can be considered. In this work, we evaluate two community detection algorithms for the task of software components identification. Considering `components' as `communities', the aim is to evaluate how independent, yet cohesive, the components are when extracted by structurally informed algorithms. We analyze 412 Java systems and evaluate the cohesion of the extracted communities using four document representation techniques. The evaluation aims to find which algorithm extracts the most semantically cohesive, yet separated communities. The results show a good performance in both algorithms, however, each has its own strengths. Leiden extracts less cohesive, but better separated, and better clustered components that depend more on similar ones. Infomap, on the other side, creates more cohesive, slightly overlapping clusters that are less likely to depend on other semantically similar components.
翻译:软件工程(CBSE) 利用现有软件模块在开发过程中促进软件的再利用,然而,这种可重复使用的组件并非即时可用,而且成本高且耗时费时。作为一种替代办法,可以考虑从原有的OO软件中提取。在这项工作中,我们评估了软件组件识别任务的两个社区检测算法。将“组件”视为“社区”,目的是评估这些组件在通过结构上知情的算法提取时是如何独立、但又具有凝聚力的。我们分析了412个 Java系统,并用四种文件代表技术评估了提取社区的凝聚力。评估的目的是找出哪些算法提取了最具有语义一致性但又最分离的社区。结果显示这两种算法都具有良好的性。 Leiden提取了较弱的、但更好的分离,以及更加依赖相似的组合组件。在另一方面,Infomap 创造出更加连贯、略有重叠的组群,而较少可能依赖其他语义上相似的组。