Data is a crucial infrastructure to how artificial intelligence (AI) systems learn. However, these systems to date have been largely model-centric, putting a premium on the model at the expense of the data quality. Data quality issues beset the performance of AI systems, particularly in downstream deployments and in real-world applications. Data-centric AI (DCAI) as an emerging concept brings data, its quality and its dynamism to the forefront in considerations of AI systems through an iterative and systematic approach. As one of the first overviews, this article brings together data-centric perspectives and concepts to outline the foundations of DCAI. It specifically formulates six guiding principles for researchers and practitioners and gives direction for future advancement of DCAI.
翻译:这些数据是人工智能系统学习的关键基础设施,然而,迄今为止,这些系统基本上以模型为中心,以牺牲数据质量为代价而重视模型; 数据质量问题困扰了人工智能系统的性能,特别是在下游部署和现实世界应用方面; 以数据为中心的AI(DCAI)作为一个新兴概念,通过迭接和系统的方法,将数据、质量和活力放在对人工智能系统的考虑的前列; 作为最初的概述之一,这一条汇集了以数据为中心的视角和概念,以概述该模型的基础; 具体为研究人员和从业人员制定了六项指导原则,并为未来推进该数据库指明了方向。