In this paper, we introduce MeDiaQA, a novel question answering(QA) dataset, which constructed on real online Medical Dialogues. It contains 22k multiple-choice questions annotated by human for over 11k dialogues with 120k utterances between patients and doctors, covering 150 specialties of diseases, which are collected from haodf.com and dxy.com. MeDiaQA is the first QA dataset where reasoning over medical dialogues, especially their quantitative contents. The dataset has the potential to test the computing, reasoning and understanding ability of models across multi-turn dialogues, which is challenging compared with the existing datasets. To address the challenges, we design MeDia-BERT, and it achieves 64.3% accuracy, while human performance of 93% accuracy, which indicates that there still remains a large room for improvement.
翻译:在本文中,我们介绍MeDiaQA,这是一个新颖的回答问题数据集,它以真正的在线医疗对话为基础。它包含22k个多重选择问题,由人类为患者和医生之间11k次以上的对话附加说明,涉及120公里的言语,涉及150个疾病专业,收集自Haodf.com和dxy.com。 MeDiaQA是第一个QA数据集,其中对医疗对话进行推理,特别是其数量内容。该数据集有可能测试多点对话中模型的计算、推理和理解能力,这与现有的数据集相比具有挑战性。为了应对挑战,我们设计了MeDia-BERT,它实现了64.3%的准确度,而人类性能达到93%的准确度,这表明仍然有很大的改进空间。