Question Generation (QG), as a challenging Natural Language Processing task, aims at generating questions based on given answers and context. Existing QG methods mainly focus on building or training models for specific QG datasets. These works are subject to two major limitations: (1) They are dedicated to specific QG formats (e.g., answer-extraction or multi-choice QG), therefore, if we want to address a new format of QG, a re-design of the QG model is required. (2) Optimal performance is only achieved on the dataset they were just trained on. As a result, we have to train and keep various QG models for different QG datasets, which is resource-intensive and ungeneralizable. To solve the problems, we propose a model named Unified-QG based on lifelong learning techniques, which can continually learn QG tasks across different datasets and formats. Specifically, we first build a format-convert encoding to transform different kinds of QG formats into a unified representation. Then, a method named \emph{STRIDER} (\emph{S}imilari\emph{T}y \emph{R}egular\emph{I}zed \emph{D}ifficult \emph{E}xample \emph{R}eplay) is built to alleviate catastrophic forgetting in continual QG learning. Extensive experiments were conducted on $8$ QG datasets across $4$ QG formats (answer-extraction, answer-abstraction, multi-choice, and boolean QG) to demonstrate the effectiveness of our approach. Experimental results demonstrate that our Unified-QG can effectively and continually adapt to QG tasks when datasets and formats vary. In addition, we verify the ability of a single trained Unified-QG model in improving $8$ Question Answering (QA) systems' performance through generating synthetic QA data.
翻译:问题生成 (QG) 是一项具有挑战性的自然语言处理任务, 目的是根据给定的答案和背景生成问题。 现有的 QG 方法主要侧重于为特定的 QG 数据集建立或培训模型。 这些工程有两大限制:(1) 这些工程是针对特定的 QG 格式( 如答- extraction 或多选择 QG ) 的, 因此, 如果我们想要解决 QG 的新格式, 需要重新设计 QG 模型。 (2) 最佳性能只有在它们刚刚训练的数据集上才能实现。 因此, 我们必须为不同的 QG 数据集( 资源密集且不可概括) 培训并保持各种 QG GG 模型。 为了解决问题, 我们提议了一个基于终身学习技术的名为 United- QG G, 它可以不断在不同数据集和格式中学习 。 具体地, 我们首先建立一个格式- converg 将不同的 QA 格式转换成统一的 Q。 然后, 一种名为 IM\\ dremD\ DE 数据 。