用于回答问题的反对数据收集效率:大规模随机研究的结果 (On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study)

In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions. Researchers hope that models trained on these more challenging datasets will rely less on superficial patterns, and thus be less brittle. However, despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models. In this paper, we conduct a large-scale controlled study focused on question answering, assigning workers at random to compose questions either (i) adversarially (with a model in the loop); or (ii) in the standard fashion (without a model). Across a variety of models and datasets, we find that models trained on adversarial data usually perform better on other adversarial datasets but worse on a diverse collection of out-of-domain evaluation sets. Finally, we provide a qualitative analysis of adversarial (vs standard) data, identifying key differences and offering guidance for future research.

翻译：在对抗性数据收集(ADC)中,一支人类劳动力队伍实时与一个模型互动,试图生成出不正确的预测实例。研究人员希望,在这些更具挑战性的数据集方面受过培训的模型将较少依赖表面模式,从而减少易碎性。然而,尽管ADC的直觉呼吁,但当关于对抗性数据集的培训产生更强有力的模型时,还不清楚。在本文件中,我们进行了大规模控制性研究,重点是回答问题,随机分配工人来回答问题,无论是(一)对抗性(循环中的模型);还是(二)标准方式(没有模型 ) 。在各种模型和数据集中,我们发现,在对抗性数据方面受过培训的模型通常在其他对抗性数据集上表现更好,但在多样化收集外部评价数据集方面则更糟。最后,我们对对抗性(五标准)数据进行定性分析,确定关键差异并为未来研究提供指导。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/