The role of data in building AI systems has recently been significantly magnified by the emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model advancements to ensuring data quality and reliability. Although our community has continuously invested efforts into enhancing data in different aspects, they are often isolated initiatives on specific tasks. To facilitate the collective initiative in our community and push forward DCAI, we draw a big picture and bring together three general missions: training data development, evaluation data development, and data maintenance. We provide a top-level discussion on representative DCAI tasks and share perspectives. Finally, we list open challenges to motivate future exploration.
翻译:以数据为中心的AI(DCAI)的新概念最近大大扩大了数据在建立AI系统方面的作用,这个概念主张从根本上从示范进步转向确保数据质量和可靠性。虽然我们社区不断努力在不同方面加强数据,但它们往往是关于具体任务的孤立倡议。为了促进我们社区的集体倡议和推进AI(DCAI),我们绘制了一个大图,汇集了三个一般性任务:培训数据开发、评价数据开发和数据维护。我们就具有代表性的DCAI的任务和共享观点进行了最高级讨论。最后,我们列举了推动未来探索的公开挑战。