Machine Learning (ML) projects incur novel challenges in their development and productionisation over traditional software applications, though established principles and best practices in ensuring the project's software quality still apply. While using static analysis to catch code smells has been shown to improve software quality attributes, it is only a small piece of the software quality puzzle, especially in the case of ML projects given their additional challenges and lower degree of Software Engineering (SE) experience in the data scientists that develop them. We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality in ML projects. An open-source static analysis tool mllint was also implemented to help detect and mitigate these. Our research evaluates this novel concept of project smells in the industrial context of ING, a global bank and large software- and data-intensive organisation. We also investigate the perceived importance of these project smells for proof-of-concept versus production-ready ML projects, as well as the perceived obstructions and benefits to using static analysis tools such as mllint. Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development, while requiring minimal configuration effort from the user.
翻译:机械学习(ML)项目在传统软件应用的开发和生产方面带来了新的挑战,尽管确保项目软件质量的既定原则和最佳做法仍然适用。虽然使用静态分析来捕捉代码气味,已经显示改进软件质量属性,但这只是软件质量难题中的一小部分,特别是考虑到ML项目面临的额外挑战以及开发数据科学家的软件工程经验较低,因此,对于ML项目而言,这仅仅是软件质量难题中的一小部分。我们引入了新颖的项目气味概念,认为项目管理缺陷是ML项目软件质量的更全面视角。还实施了开放源静态分析工具模子,以帮助检测和缓解这些缺陷。我们的研究评估了该项目在工业环境中的这个新概念,即全球银行以及大型软件和数据密集型组织。我们还调查了这些项目被认为对于验证概念和生产成熟的ML项目的重要性,以及使用静态分析工具如Mllint被认为存在的障碍和好处。我们的调查结果表明,需要从目前阶段的用户的配置中找到符合项目需要的环境意识静态分析工具。