We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.
翻译:我们提出一个两阶段神经模型来解决文件产生的问题。 首先,我们的模型估计,文件中的单词序列是人类在选择候选答案时通过在问答题中进行神经关键词提取器的培训而选择的概率。 预测关键词可以作为目标答案,然后用复制机制来设定一个顺序到顺序的问题生成模型。 关键词提取模型经常大大优于一个实体拖累基线和现有的基于规则的方法。 我们进一步证明,我们的问题生成系统在关键词中提出了流畅、可回答的问题。 这个两阶段系统可以用来增加或生成阅读理解数据集,这些数据集可以被用来改进机器阅读系统或教育环境。