Neural Conversational QA tasks like ShARC require systems to answer questions based on the contents of a given passage. On studying recent state-of-the-art models on the ShARCQA task, we found indications that the models learn spurious clues/patterns in the dataset. Furthermore, we show that a heuristic-based program designed to exploit these patterns can have performance comparable to that of the neural models. In this paper we share our findings about four types of patterns found in the ShARC corpus and describe how neural models exploit them. Motivated by the aforementioned findings, we create and share a modified dataset that has fewer spurious patterns, consequently allowing models to learn better.
翻译:象 ShARC 这样的神经对立 QA 任务要求系统根据特定段落的内容回答问题。 在研究有关 ShARQA 任务的最新最先进的模型时, 我们发现有迹象表明这些模型在数据集中学习了虚假的线索/模式。 此外, 我们显示, 设计利用这些模式的基于休丁基的程序的性能可以与神经模型的性能相当。 在本文中, 我们分享了我们在 ShARC 文中发现的四个类型模式的研究结果, 并描述了神经模型是如何利用这些模式的。 根据上述发现, 我们创建并分享了一个经过修改的数据集, 其假模式较少, 从而使得模型能够学习更好的。