High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.
翻译:高质量的大规模数据是AI系统取得成功的关键,然而,大规模数据说明工作往往面临一系列共同的挑战:(1) 设计方便用户的注解接口;(2) 有效培训足够的注解员;(3) 复制;为解决这些问题,我们引入了Crowdaq,这是一个开放源码平台,使数据收集管道标准化,采用可定制的用户界面组件,自动注解资格,以可再使用的格式保存管道。我们显示,Crowdaq大量简化了对多种数据收集使用案例的数据说明,我们希望它能够成为社区方便的工具。