与Crowdaq合作收集容易、可复制和质量控制的数据 (Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq) - 专知论文

会员服务 ·

0

Automator · CASES · 情景 · 多样性 · 设计 ·

2020 年 10 月 6 日

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

翻译：与Crowdaq合作收集容易、可复制和质量控制的数据

Qiang Ning,Hao Wu,Pradeep Dasigi,Dheeru Dua,Matt Gardner,Robert L. Logan IV,Ana Marasovic,Zhen Nie

from arxiv, Accepted to the demo track of EMNLP 2020

High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.

翻译：高质量的大规模数据是AI系统取得成功的关键,然而,大规模数据说明工作往往面临一系列共同的挑战:(1) 设计方便用户的注解接口;(2) 有效培训足够的注解员;(3) 复制;为解决这些问题,我们引入了Crowdaq,这是一个开放源码平台,使数据收集管道标准化,采用可定制的用户界面组件,自动注解资格,以可再使用的格式保存管道。我们显示,Crowdaq大量简化了对多种数据收集使用案例的数据说明,我们希望它能够成为社区方便的工具。

0

相关内容

Automator

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【新书】数据科学编程傻瓜式教程，数据科学编程一体机 | Data Science Programming All-In-One For Dummies

【新书】数据科学编程傻瓜式教程，数据科学编程一体机 | Data Science Programming All-In-One For Dummies

专知会员服务

41+阅读 · 2020年1月22日

【2020年AI趋势摘要：可嵌入、可迁移、可评价】《A Distilled List of AI Trends For 2020 - Towards Data Science》by Roberto Sannazzaro

【2020年AI趋势摘要：可嵌入、可迁移、可评价】《A Distilled List of AI Trends For 2020 - Towards Data Science》by Roberto Sannazzaro

专知会员服务

14+阅读 · 2019年12月20日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

已删除

将门创投

3+阅读 · 2018年11月20日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Open-World Learning Without Labels

Open-World Learning Without Labels

Arxiv

0+阅读 · 2020年11月25日

Anomaly Detection of Mobility Data with Applications to COVID-19 Situational Awareness

Arxiv

0+阅读 · 2020年11月25日

A Robust Feature-aware Sparse Mesh Representation

Arxiv

0+阅读 · 2020年11月24日

Mixture of Conditional Gaussian Graphical Models for unlabelled heterogeneous populations in the presence of co-factors

Arxiv

0+阅读 · 2020年11月24日

Computing Systems for Autonomous Driving: State-of-the-Art and Challenges

Arxiv

0+阅读 · 2020年11月20日

Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian

Arxiv

0+阅读 · 2020年11月19日

Digital trace data collection through data donation

Arxiv

0+阅读 · 2020年11月13日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

CoQA: A Conversational Question Answering Challenge

CoQA: A Conversational Question Answering Challenge

Arxiv

7+阅读 · 2018年8月21日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【新书】数据科学编程傻瓜式教程，数据科学编程一体机 | Data Science Programming All-In-One For Dummies

【新书】数据科学编程傻瓜式教程，数据科学编程一体机 | Data Science Programming All-In-One For Dummies

专知会员服务

41+阅读 · 2020年1月22日

【2020年AI趋势摘要：可嵌入、可迁移、可评价】《A Distilled List of AI Trends For 2020 - Towards Data Science》by Roberto Sannazzaro

【2020年AI趋势摘要：可嵌入、可迁移、可评价】《A Distilled List of AI Trends For 2020 - Towards Data Science》by Roberto Sannazzaro

专知会员服务

14+阅读 · 2019年12月20日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

反无人机：乌克兰拦截型无人机系列一览

《自适应鲁棒马尔可夫决策过程：协同作战飞机（CCA）对抗性监视任务应用》44页技术报告

物理学中的高级深度学习

观点动力学：全面综述

相关资讯

已删除

将门创投

3+阅读 · 2018年11月20日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Open-World Learning Without Labels

Open-World Learning Without Labels

Arxiv

0+阅读 · 2020年11月25日

Anomaly Detection of Mobility Data with Applications to COVID-19 Situational Awareness

Arxiv

0+阅读 · 2020年11月25日

A Robust Feature-aware Sparse Mesh Representation

Arxiv

0+阅读 · 2020年11月24日

Mixture of Conditional Gaussian Graphical Models for unlabelled heterogeneous populations in the presence of co-factors

Arxiv

0+阅读 · 2020年11月24日

Computing Systems for Autonomous Driving: State-of-the-Art and Challenges

Arxiv

0+阅读 · 2020年11月20日

Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian

Arxiv

0+阅读 · 2020年11月19日

Digital trace data collection through data donation

Arxiv

0+阅读 · 2020年11月13日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

CoQA: A Conversational Question Answering Challenge

CoQA: A Conversational Question Answering Challenge

Arxiv

7+阅读 · 2018年8月21日

Translating Pro-Drop Languages with Reconstruction Models

Arxiv

3+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员