Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting.We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/google-research-datasets/disfl-qa.
翻译:尽管在人文对话中这种差异无处不在,但在《国家数据手册》中,这种差异是一个研究不足的专题,尽管它无处不在。这主要是由于缺乏包含不易的数据集。在本文中,我们提出了一个新的挑战问题:答案数据集,即SQAD的衍生物Disfl-QA, 人类在先前流畅的问题中引入了背景差异。Disfl-QA, 人类在其中引入了背景差异。 Disfl-QA 中包含各种挑战性差异,需要比先前数据集中的必要内容更全面地理解文本。实验显示,在零发式情况下对“Disfl-QA”进行测试时,现有状态问题回答模型的性能会显著退化。我们展示了数据增强方法,部分恢复了性能损失,并展示了使用黄金数据进行微调的功效。我们争辩说,为了“NLP”模型的坚固性,我们需要大规模差异数据集。数据集可以公开查阅:https://github.com/golegleglear-res/disk-res/disklax-d-qat-qata。