Increasing availability of machine learning (ML) frameworks and tools, as well as their promise to improve solutions to data-driven decision problems, has resulted in popularity of using ML techniques in software systems. However, end-to-end development of ML-enabled systems, as well as their seamless deployment and operations, remain a challenge. One reason is that development and deployment of ML-enabled systems involves three distinct workflows, perspectives, and roles, which include data science, software engineering, and operations. These three distinct perspectives, when misaligned due to incorrect assumptions, cause ML mismatches which can result in failed systems. We conducted an interview and survey study where we collected and validated common types of mismatches that occur in end-to-end development of ML-enabled systems. Our analysis shows that how each role prioritizes the importance of relevant mismatches varies, potentially contributing to these mismatched assumptions. In addition, the mismatch categories we identified can be specified as machine readable descriptors contributing to improved ML-enabled system development. In this paper, we report our findings and their implications for improving end-to-end ML-enabled system development.
翻译:机学框架和工具越来越多,而且它们承诺改进数据驱动决策问题的解决方案,因此软件系统中使用ML技术受到欢迎。然而,ML辅助系统的端到端开发及其无缝部署和运行仍然是一个挑战。一个原因是,ML辅助系统的开发和部署涉及三个不同的工作流程、观点和作用,其中包括数据科学、软件工程和业务。这三个不同的视角,如果由于错误假设而出现误差,导致ML错配,可能导致系统失灵。我们进行了访谈和调查研究,收集和验证了ML辅助系统端到端开发中常见的不匹配类型。我们的分析表明,每个角色如何优先考虑相关不匹配的重要性,可能会促成这些不匹配的假设。此外,我们确定的不匹配类别可以被指定为机器可读解码器,有助于改进ML辅助系统开发。我们在本文件中报告了我们的调查结果及其对改进端到端 ML辅助系统开发的影响。