Context: Advancements in machine learning (ML) lead to a shift from the traditional view of software development, where algorithms are hard-coded by humans, to ML systems materialized through learning from data. Therefore, we need to revisit our ways of developing software systems and consider the particularities required by these new types of systems. Objective: The purpose of this study is to systematically identify, analyze, summarize, and synthesize the current state of software engineering (SE) research for engineering ML systems. Method: I performed a systematic literature review (SLR). I systematically selected a pool of 141 studies from SE venues and then conducted a quantitative and qualitative analysis using the data extracted from these studies. Results: The non-deterministic nature of ML systems complicates all SE aspects of engineering ML systems. Despite increasing interest from 2018 onwards, the results reveal that none of the SE aspects have a mature set of tools and techniques. Testing is by far the most popular area among researchers. Even for testing ML systems, engineers have only some tool prototypes and solution proposals with weak experimental proof. Many of the challenges of ML systems engineering were identified through surveys and interviews. Researchers should conduct experiments and case studies, ideally in industrial environments, to further understand these challenges and propose solutions. Conclusion: The results may benefit (1) practitioners in foreseeing the challenges of ML systems engineering; (2) researchers and academicians in identifying potential research questions; and (3) educators in designing or updating SE courses to cover ML systems engineering.
翻译:机械学习(ML)方面的进步导致从传统的软件开发观点转变,传统的软件开发观点是,算法由人类硬码编码,通过数据学习实现ML系统,因此,我们需要重新审视开发软件系统的方法,考虑这些新型系统要求的特殊性。目标:本研究的目的是系统地查明、分析、总结和综合目前用于工程ML系统的软件工程研究现状。方法:我进行了系统的文献审查。我从SE地点系统挑选了141项研究,然后利用从这些研究中提取的数据进行了定量和定性分析。结果:ML系统的非非非非定性性质使工程ML系统的所有SE方面复杂化。尽管从2018年起人们越来越感兴趣,但研究结果表明SE方面没有一个具有一套成熟的工具和技术。测试是研究人员中最受欢迎的领域。即使测试ML系统,工程师也只有一些工具原型和解决方案,而且实验证据薄弱。ML系统工程设计的许多挑战是通过调查和访谈确定的。ML系统的挑战是通过调查和访谈来应对SEL系统的挑战。