Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.
翻译:背景:最近的研究使用数据挖掘开发了可指导开发人员进行源代码变更的技术。据我们所知,很少有研究调查数据挖掘技术并/或将其结果与其他算法或基线进行比较。目标:本文提出了一种使用四种数据挖掘算法自动推荐源代码变更的方法。我们不仅使用这些算法来推荐源代码变更,而且我们还进行了实证评估。方法:我们的调查包括七个开源项目,从这些项目中以文件级别提取源变更历史记录。我们使用四种广泛应用的数据挖掘算法,即Apriori、FP-Growth、Eclat和Relim来比较这些算法在性能(Precision、Recall和F-measure)和执行时间方面的差异。结果:我们的研究结果提供了实证证据,证明一些频繁模式挖掘算法,如Apriori,在某些情况下可能优于其他算法,但结果并不始终如一,这更可能是由于所研究的项目的性质和特点,特别是它们的变更历史记录。结论:Apriori似乎适合大规模项目,而Eclat似乎适合小规模项目。此外,FP-Growth在执行时间方面似乎是一种高效的方法。