The role of data in building AI systems has recently been significantly magnified by the emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model advancements to ensuring data quality and reliability. Although our community has continuously invested efforts into enhancing data in different aspects, they are often isolated initiatives on specific tasks. To facilitate the collective initiative in our community and push forward DCAI, we draw a big picture and bring together three general missions: training data development, inference data development, and data maintenance. We provide a top-level discussion on representative DCAI tasks and share perspectives. Finally, we list open challenges. More resources are summarized at https://github.com/daochenzha/data-centric-AI
翻译:最近,通过数据为中心的人工智能(DCAI)的新概念,数据在构建人工智能系统方面的作用得到了显著的放大。该概念倡导将模型进展的关注点从模型本身转向确保数据质量和可靠性。虽然我们的社区在不同方面一直投入了提升数据的努力,但它们往往是针对特定任务的孤立倡议。为了促进社区的集体行动并推动DCAI的发展,我们绘制了一个大的框架,将三个一般任务聚集起来:训练数据开发、推断数据开发和数据维护。我们对具有代表性的DCAI任务进行了高层次的讨论并分享了观点。最后,我们列出了未解决的挑战。更多资源总结在 https://github.com/daochenzha/data-centric-AI