Generating high quality question-answer pairs is a hard but meaningful task. Although previous works have achieved great results on answer-aware question generation, it is difficult to apply them into practical application in the education field. This paper for the first time addresses the question-answer pair generation task on the real-world examination data, and proposes a new unified framework on RACE. To capture the important information of the input passage we first automatically generate(rather than extracting) keyphrases, thus this task is reduced to keyphrase-question-answer triplet joint generation. Accordingly, we propose a multi-agent communication model to generate and optimize the question and keyphrases iteratively, and then apply the generated question and keyphrases to guide the generation of answers. To establish a solid benchmark, we build our model on the strong generative pre-training model. Experimental results show that our model makes great breakthroughs in the question-answer pair generation task. Moreover, we make a comprehensive analysis on our model, suggesting new directions for this challenging task.
翻译:生成高质量的问答配对是一项艰巨但有意义的任务。 尽管先前的工作在解答问题生成方面取得了巨大成果, 但很难将其应用于教育领域的实际应用。 本文首次讨论了关于真实世界考试数据的问答配对生成任务, 并提出了关于RACE的新的统一框架。 为了捕捉我们首次自动生成( 而不是提取) 关键词输入通道的重要信息, 因此, 此项任务将降为关键词解答三重组合。 因此, 我们建议多机构通信模式, 生成并优化问题和关键词的迭接, 然后将生成的问题和关键词用于指导答案的生成。 为了建立一个坚实的基准, 我们建构了我们强大的基因化培训前模型。 实验结果表明, 我们的模式在问答配对生成任务中取得了重大突破。 此外, 我们对我们的模式进行了全面分析, 为这项富有挑战性的任务提出了新的方向 。