In the past two decades, several Machine Learning (ML) libraries have become freely available. Many studies have used such libraries to carry out empirical investigations on predictive Software Engineering (SE) tasks. However, the differences stemming from using one library over another have been overlooked, implicitly assuming that using any of these libraries would provide the user with the same or very similar results. This paper aims at raising awareness of the differences incurred when using different ML libraries for software development effort estimation (SEE), one of most widely studied SE prediction tasks. To this end, we investigate 4 deterministic machine learners as provided by 3 of the most popular ML open-source libraries written in different languages (namely, Scikit-Learn, Caret and Weka). We carry out a thorough empirical study comparing the performance of the machine learners on 5 SEE datasets in the two most common SEE scenarios (i.e., out-of-the-box-ml and tuned-ml) as well as an in-depth analysis of the documentation and code of their APIs. The results of our study reveal that the predictions provided by the 3 libraries differ in 95% of the cases on average across a total of 105 cases studied. These differences are significantly large in most cases and yield misestimations of up to approx. 3,000 hours per project. Moreover, our API analysis reveals that these libraries provide the user with different levels of control on the parameters one can manipulate, and a lack of clarity and consistency, overall, which might mislead users. Our findings highlight that the ML library is an important design choice for SEE studies, which can lead to a difference in performance. However, such a difference is under-documented. We conclude by highlighting open-challenges with suggestions for the developers of libraries as well as for the researchers and practitioners using them.
翻译:在过去二十年中,若干机器学习(ML)图书馆已经免费提供,许多研究利用这些图书馆对预测软件工程(SE)的任务进行了经验性调查,但是,使用一个图书馆对另一个图书馆的差别被忽略了,暗中假定使用任何一个图书馆将为用户提供相同或非常相似的结果。本文旨在提高对使用不同的ML图书馆进行软件开发工作估算(SEEE)时产生的差别的认识,这是研究最广泛的SEE预测任务之一。为此,我们调查了以不同语言(即Scikit-Learn、Caret和Weka)撰写的最受欢迎的ML开源图书馆的3个最受欢迎的开源图书馆提供的4个确定性机读者。我们进行了一项彻底的经验性研究,比较了5个SEE数据集的机器学习者在两种最常见的SEE情景中的表现(即,在试箱外和调阅的Ml),以及对这些文件和代码的深度分析可以显示他们的API。我们研究的结果显示,3000个图书馆提供的总预测值差异是105个图书馆在平均时间里显示的10 %的项目分析。