The growth of online consumer health questions has led to the necessity for reliable and accurate question answering systems. A recent study showed that manual summarization of consumer health questions brings significant improvement in retrieving relevant answers. However, the automatic summarization of long questions is a challenging task due to the lack of training data and the complexity of the related subtasks, such as the question focus and type recognition. In this paper, we introduce a reinforcement learning-based framework for abstractive question summarization. We propose two novel rewards obtained from the downstream tasks of (i) question-type identification and (ii) question-focus recognition to regularize the question generation model. These rewards ensure the generation of semantically valid questions and encourage the inclusion of key medical entities/foci in the question summary. We evaluated our proposed method on two benchmark datasets and achieved higher performance over state-of-the-art models. The manual evaluation of the summaries reveals that the generated questions are more diverse and have fewer factual inconsistencies than the baseline summaries
翻译:网上消费者健康问题的增长导致必须建立可靠和准确的问答系统。最近的一项研究表明,对消费者健康问题的人工总结在检索相关答案方面带来显著改进。然而,由于缺乏培训数据和相关子任务的复杂性,例如问题焦点和类型识别等,对长期问题的自动总结是一项艰巨的任务。我们在本文件中为抽象问题汇总引入了一个强化学习框架。我们建议从以下下游任务中获得两个新的奖励:(一) 问题类型识别和(二) 问题重点识别,以使问题生成模式正规化。这些奖励确保产生具有内在效力的问题,并鼓励将关键医疗实体/联合纳入问题摘要。我们评估了我们关于两个基准数据集的拟议方法,在最新模型上取得了更高的绩效。对摘要的手工评估表明,产生的问题比基线摘要更加多样化,事实上的不一致性也更少。