环行模式:帮助人群工人,配有创创性说明助理 (Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants)

In Dynamic Adversarial Data Collection (DADC), human annotators are tasked with finding examples that models struggle to predict correctly. Models trained on DADC-collected training data have been shown to be more robust in adversarial and out-of-domain settings, and are considerably harder for humans to fool. However, DADC is more time-consuming than traditional data collection and thus more costly per annotated example. In this work, we examine whether we can maintain the advantages of DADC, without incurring the additional cost. To that end, we introduce Generative Annotation Assistants (GAAs), generator-in-the-loop models that provide real-time suggestions that annotators can either approve, modify, or reject entirely. We collect training datasets in twenty experimental settings and perform a detailed analysis of this approach for the task of extractive question answering (QA) for both standard and adversarial data collection. We demonstrate that GAAs provide significant efficiency benefits with over a 30% annotation speed-up, while leading to over a 5x improvement in model fooling rates. In addition, we find that using GAA-assisted training data leads to higher downstream model performance on a variety of question answering tasks over adversarial data collection.

翻译：在动态的Adversarial数据收集(DADC)中,人文说明员的任务是寻找模型难以正确预测的例子,在DDC收集的培训数据模型在对抗和外部环境中被证明在DDC收集的培训数据模型中更加强大,对于人类来说更难愚弄,然而,DDC比传统的数据收集更费时,因此对附加注释的例子也更费钱。在这项工作中,我们研究我们是否能够在不产生额外费用的情况下保持DADC的优势。为此,我们引入了创意说明助理(GAAAs)、发电机在网上提供实时建议的DDC收集的培训数据模型,这些模型提供实时建议,说明说明说明者既可以批准、修改,也可以完全拒绝。我们在20个实验环境中收集培训数据集,并对用于标准数据和对抗性数据采集的抽取问题回答(QA)任务的方法进行详细分析。我们证明,GAAs提供了超过30%的加注速度的重大效率效益效益,同时导致模型欺骗率的改进超过5x。此外,我们发现,使用GAA辅助性培训的数据收集导致更高的下游问题。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日