The explorative and iterative nature of developing and operating machine learning (ML) applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.
翻译:开发和操作机器学习(ML)应用的探索性和迭接性导致各种手工艺品,例如数据集、特征、模型、模型、超参数、计量、软件、配置和日志。为了使这些手工艺品在整个ML生命周期步骤和迭代中具有可比性、可复制性和可追踪性,已经开发了支持其收集、储存和管理的系统和工具,这些系统提供的确切功能范围往往不十分明显,以便比较和估计候选人之间的协同效应非常具有挑战性。在本文件中,我们的目标是概述支持管理ML生命周期工艺品的系统和平台。根据系统文献审查,我们得出评估标准,并将其应用于有代表性的60多个系统和平台的选择。