We investigate the less-explored task of generating open-ended questions that are typically answered by multiple sentences. We first define a new question type ontology which differentiates the nuanced nature of questions better than widely used question words. A new dataset with 4,959 questions is labeled based on the new ontology. We then propose a novel question type-aware question generation framework, augmented by a semantic graph representation, to jointly predict question focuses and produce the question. Based on this framework, we further use both exemplars and automatically generated templates to improve controllability and diversity. Experiments on two newly collected large-scale datasets show that our model improves question quality over competitive comparisons based on automatic metrics. Human judges also rate our model outputs highly in answerability, coverage of scope, and overall quality. Finally, our model variants with templates can produce questions with enhanced controllability and diversity.
翻译:我们首先确定一种新的问题本体学类型,它比广泛使用的提问单词更能区分问题的细细性质。一个新的数据集有4 959个问题,根据新的本体学贴上了标签。然后我们提出了一个新颖的有意识问题生成框架,辅之以一个语义图,以共同预测问题焦点并产生问题。根据这个框架,我们进一步使用示例和自动生成模板来改进可控性和多样性。对两个新收集的大型数据集的实验显示,我们的模型在基于自动计量的竞争性比较基础上提高了质量。人类法官还对我们模型输出的可答性、范围覆盖面和总体质量进行了很高的评级。最后,我们带有模板的模型变体可以产生增强可控性和多样性的问题。