Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse domains, structure, and behaviour of software systems, the suitability of different clustering algorithms for different software systems are not investigated thoroughly. Research that introduce new clustering techniques usually validate their approaches on a specific domain, which might limit its generalisability. If the chosen test subjects could only represent a narrow perspective of the whole picture, researchers might risk not being able to address the external validity of their findings. This work aims to fill this gap by introducing a new approach, Explaining Software Clustering for Remodularisation, to evaluate the effectiveness of different software clustering approaches. This work focuses on hierarchical clustering and Bunch clustering algorithms and provides information about their suitability according to the features of the software, which as a consequence, enables the selection of the most optimum algorithm and configuration from our existing pool of choices for a particular software system. The proposed framework is tested on 30 open source software systems with varying sizes and domains, and demonstrates that it can characterise both the strengths and weaknesses of the analysed software clustering algorithms using software features extracted from the code. The proposed approach also provides a better understanding of the algorithms behaviour through the application of dimensionality reduction techniques.
翻译:现有软件的维护需要大量时间来理解源代码。但是,软件的架构如果不具备最新文件,对维护者来说可能并不明确。软件集群往往被用作一种重新调节和结构恢复技术,以帮助恢复软件设计中的语义表述。由于软件系统的不同领域、结构和行为,不同组合算法对不同软件系统的适宜性没有得到彻底调查。引入新组合算法通常在特定领域验证其方法的研究可能限制其普遍性。如果所选择的测试科目只能代表整个图象的狭隘视角,研究人员可能面临无法解决其调查结果的外部有效性的风险。这项工作的目的是通过采用新的方法填补这一空白,解释软件集群对软件设计进行重新调节,评估不同软件组合方法的有效性。这项工作侧重于等级组合和组合算法,并提供信息,说明其是否适合软件的特性,从而使得能够从我们现有的选择库中选择最优化的算法和配置,研究人员可能无法解决其调查结果的外部有效性问题。这项工作的目的是通过采用新的方法,解释软件组合法的软件结构特征,通过对软件的系统规模进行更好的分析,并用软件模型的变强性进行测试。