Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
翻译:机器阅读理解(MRC)旨在解答某一段落中的问题,最近已经进行了广泛研究,特别是在开放领域,然而,在封闭区MRC方面,主要由于缺乏大规模培训数据,没有做出多少努力;在本文件中,我们为医疗领域引入了多目标MRC任务,目标是同时从医疗信息源预测对医疗问题的答案和相应的辅助句子,以确保医疗知识服务的高度可靠性;为此目的手工构建了一个高质量的数据集,名为多任务中国医疗MRC数据集(MMEDMRC),并进行了详细分析;我们进一步提议了中国医疗BERT任务模式(CMedBERT),该模式将医疗知识通过多种特征的动态融合机制和多任务学习战略纳入培训前的语言模式;实验显示,CMedBERT始终通过使用环境认知和知识标识演示,超越了强大的基线。