Question Generation (QG) is a fundamental NLP task for many downstream applications. Recent studies on open-book QG, where supportive question-context pairs are provided to models, have achieved promising progress. However, generating natural questions under a more practical closed-book setting that lacks these supporting documents still remains a challenge. In this work, to learn better representations from semantic information hidden in question-answer pairs under the closed-book setting, we propose a new QG model empowered by a contrastive learning module and an answer reconstruction module. We present a new closed-book QA dataset -- WikiCQA involving abstractive long answers collected from a wiki-style website. In the experiments, we validate the proposed QG model on both public datasets and the new WikiCQA dataset. Empirical results show that the proposed QG model outperforms baselines in both automatic evaluation and human evaluation. In addition, we show how to leverage the proposed model to improve existing closed-book QA systems. We observe that by pre-training a closed-book QA model on our generated synthetic QA pairs, significant QA improvement can be achieved on both seen and unseen datasets, which further demonstrates the effectiveness of our QG model for enhancing unsupervised and semi-supervised QA.
翻译:问题生成( QG) 是许多下游应用的基本 NLP 任务。 最近对开放书籍 QG 进行的研究, 向模型提供支持性的问答配对, 取得了令人乐观的进展。 然而, 在更实用的封闭书籍设置下产生的自然问题, 缺乏这些辅助文件, 仍然是个挑战。 在这项工作中, 要从在封闭书籍设置下的问答配对中隐藏的语义信息中获取更好的表述, 我们提议一个新的 QG 模型, 由对比式学习模块和答题重建模块授权。 我们展示了一个新的封闭书籍 QA 数据集 -- 维基风格网站收集的抽象长答案。 在实验中, 我们验证了关于公共数据集和新维基卡卡数据集的拟议QG 模型。 经验性结果显示, 拟议的QG 模型在自动评价和人类评价中都超越了基准。 此外, 我们展示了如何利用拟议的模式来改进现有的封闭版QA系统。 我们观察到, 通过对关闭的QA QA 模型进行预培训, 在我们生成的合成A 的模型上进行重要的升级。