Machine reading comprehension (MRC) is an important area of conversation agents and draws a lot of attention. However, there is a notable limitation to current MRC benchmarks: The labeled answers are mostly either spans extracted from the target corpus or the choices of the given candidates, ignoring the natural aspect of high-quality responses. As a result, MRC models trained on these datasets can not generate human-like responses in real QA scenarios. To this end, we construct a new dataset called Penguin to promote the research of MRC, providing a training and test bed for natural response generation to real scenarios. Concretely, Penguin consists of 200k training data with high-quality fluent, and well-informed responses. Penguin is the first benchmark towards natural response generation in Chinese MRC on a relatively large scale. To address the challenges in Penguin, we develop two strong baselines: end-to-end and two-stage frameworks. Following that, we further design Prompt-BART: fine-tuning the pre-trained generative language models with a mixture of prefix prompts in Penguin. Extensive experiments validated the effectiveness of this design.
翻译:机器阅读理解(MRC)是对话媒介的一个重要领域,引起人们的极大关注。然而,对目前的MRC基准有明显的限制:标签的答案大多是从目标体或特定候选人的选择中抽取的,忽略了高质量反应的自然方面。因此,在这些数据集上受过培训的MRC模型无法在真实的QA情景中产生人性化反应。为此,我们建立了一个称为企鹅的新数据集,以促进MRC的研究,为自然反应生成真实情景提供培训和测试床。具体来说,企鹅由200公里的培训数据组成,具有高质量的流利和知情反应。企鹅是中国MRC在相对较大规模上自然反应生成的第一个基准。为了应对企鹅的挑战,我们制定了两个强有力的基准:端到端和两阶段框架。随后,我们进一步设计“快速-BARRT”:对预先训练的基因缩写语言模型进行微调,并配有企鹅前导力的混合。广泛的实验证实了这一设计的有效性。