Machine Learning (ML) is being used in multiple disciplines due to its powerful capability to infer relationships within data. In particular, Software Engineering (SE) is one of those disciplines in which ML has been used for multiple tasks, like software categorization, bugs prediction, and testing. In addition to the multiple ML applications, some studies have been conducted to detect and understand possible pitfalls and issues when using ML. However, to the best of our knowledge, only a few studies have focused on presenting ML best practices or guidelines for the application of ML in different domains. In addition, the practices and literature presented in previous literature (i) are domain-specific (e.g., concrete practices in biomechanics), (ii) describe few practices, or (iii) the practices lack rigorous validation and are presented in gray literature. In this paper, we present a study listing 127 ML best practices systematically mining 242 posts of 14 different Stack Exchange (STE) websites and validated by four independent ML experts. The list of practices is presented in a set of categories related to different stages of the implementation process of an ML-enabled system; for each practice, we include explanations and examples. In all the practices, the provided examples focus on SE tasks. We expect this list of practices could help practitioners to understand better the practices and use ML in a more informed way, in particular newcomers to this new area that sits at the intersection of software engineering and machine learning.
翻译:特别是,软件工程(SE)是ML被用于多种任务的学科之一,例如软件分类、错误预测和测试。除了多种ML应用程序外,还进行了一些研究,以发现和了解使用ML时可能出现的陷阱和问题。然而,据我们所知,只有少数研究侧重于介绍ML最佳做法或准则,用于在不同领域应用ML。此外,以往文献(一)中介绍的做法和文献是特定领域(例如生物机械学的具体做法),(二)描述少数做法,或(三)做法缺乏严格的验证,并在灰色文献中介绍。在本文件中,我们提出一份研究报告,列出127 ML最佳做法,系统挖掘14个不同的Stack Exchange(STE)网站的242个员额,并得到4名独立的ML专家的验证。一系列做法清单与ML辅助系统实施过程的不同阶段有关,是针对具体领域的(一)做法和文献(例如生物机械机械学的具体做法),(二)介绍少数做法,或(三)做法缺乏严格的验证,并在灰色文献中介绍。我们用SEL系统的所有做法都提供更好的解释和学习方法。