Question answering (QA) models for reading comprehension have achieved human-level accuracy on in-distribution test sets. However, they have been demonstrated to lack robustness to challenge sets, whose distribution is different from that of training sets. Existing data augmentation methods mitigate this problem by simply augmenting training sets with synthetic examples sampled from the same distribution as the challenge sets. However, these methods assume that the distribution of a challenge set is known a priori, making them less applicable to unseen challenge sets. In this study, we focus on question-answer pair generation (QAG) to mitigate this problem. While most existing QAG methods aim to improve the quality of synthetic examples, we conjecture that diversity-promoting QAG can mitigate the sparsity of training sets and lead to better robustness. We present a variational QAG model that generates multiple diverse QA pairs from a paragraph. Our experiments show that our method can improve the accuracy of 12 challenge sets, as well as the in-distribution accuracy. Our code and data are available at https://github.com/KazutoshiShinoda/VQAG.
翻译:阅读解答(QA)模型在分布式测试组中实现了人的水平精确度,然而,事实证明,这些模型对挑战组的分布不同于培训组的分布缺乏强健性,现有数据增强方法通过简单地增加培训组,从与挑战组相同的分布中抽样合成示例,缓解了这一问题;然而,这些方法假定,对挑战组的分布是先验的,使挑战组较少适用于隐蔽的挑战组。在本研究中,我们侧重于问答组群(QAG)来缓解这一问题。虽然大多数现有的问答组方法旨在提高合成实例的质量,但我们推测,多样性促进型的QAG可以减轻培训组的广度,并导致更稳健。我们提出了一个变式的QAG模型,从一段中产生多种不同的QA配对。我们的实验表明,我们的方法可以提高12个挑战组的准确性,以及分配的准确性。我们的代码和数据可在https://github.com/Kazutishinoda/VQAG中查阅。