Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development and deployment of ML applications, contributing to a crowded landscape of disconnected solutions targeted at different stages, or components, of the ML lifecycle. A lack of end-to-end ML pipeline visibility makes it hard to address any issues that may arise after a production deployment, such as unexpected output values or lower-quality predictions. In this paper, we propose a system that wraps around existing tools in the ML development stack and offers end-to-end observability. We introduce our prototype and our vision for mltrace, a platform-agnostic system that provides observability to ML practitioners by (1) executing predefined tests and monitoring ML-specific metrics at component runtime, (2) tracking end-to-end data flow, and (3) allowing users to ask arbitrary post-hoc questions about pipeline health.
翻译:软件组织越来越多地将机器学习(ML)纳入其产品提供中,从而需要新的数据管理工具。许多这些工具便利了ML应用程序的初始开发和部署,促进了针对ML生命周期不同阶段或不同组成部分的脱节解决方案的拥挤地貌。由于缺乏端到端的ML管道可见度,难以解决生产部署后可能出现的任何问题,如意外产出值或低质量预测。在本文中,我们提出了一个环绕ML开发堆中现有工具的系统,并提供端到端的可观测性。我们介绍了我们的Mltrace原型和我们的Mltrace愿景,这是一个平台-不可知性系统,通过(1) 在部件运行时进行预先定义的测试和监测ML特定指标,(2) 跟踪端到端的数据流,(3) 允许用户就管道健康提出任意的事后问题。