Data-centric AI has shed light on the significance of data within the machine learning (ML) pipeline. Acknowledging its importance, various research and policies are suggested by academia, industry, and government departments. Although the capability of utilizing existing data is essential, the capability to build a dataset has become more important than ever. In consideration of this trend, we propose a "Data Management Operation and Recipes" that will guide the industry regardless of the task or domain. In other words, this paper presents the concept of DMOps derived from real-world experience. By offering a baseline for building data, we want to help the industry streamline its data operation optimally.
翻译:以数据为中心的大赦国际揭示了数据在机器学习(ML)管道中的重要性。承认其重要性,学术界、工业界和政府部门提出了各种研究和政策。尽管利用现有数据的能力至关重要,但建立数据集的能力比以往任何时候都更加重要。考虑到这一趋势,我们建议采用“数据管理操作和参考”来指导该行业,而不论其任务或领域如何。换句话说,本文件介绍了从现实世界经验中得出的DMOps概念。通过提供构建数据的基线,我们希望帮助该行业优化其数据操作。