Data-centric AI is a new and exciting research topic in the AI community, but many organizations already build and maintain various "data-centric" applications whose goal is to produce high quality data. These range from traditional business data processing applications (e.g., "how much should we charge each of our customers this month?") to production ML systems such as recommendation engines. The fields of data and ML engineering have arisen in recent years to manage these applications, and both include many interesting novel tools and processes. In this paper, we discuss several lessons from data and ML engineering that could be interesting to apply in data-centric AI, based on our experience building data and ML platforms that serve thousands of applications at a range of organizations.
翻译:以数据为中心的人工智能是AI界一个新的令人振奋的研究课题,但许多组织已经建立和保持了各种“以数据为中心的”应用,其目的是产生高质量的数据。这些应用包括传统的商业数据处理应用(例如“我们每个客户本月应该承担多少费用? ” ),以及生产建议引擎等ML系统。近年来,数据和ML工程领域出现来管理这些应用,它们包括许多有趣的新工具和过程。在本论文中,我们讨论了数据和ML工程学中的一些教训,这些教训在以数据为中心的人工智能中可能很有意义,这些教训基于我们在一系列组织中为数千种应用服务的建立数据和ML平台的经验。