发展多答案和多焦点问题采掘临床问题解答数据集 (Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions)

Background: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions. Objective: Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions. Methods: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic one-drug-one-reason cases. A baseline solution was developed and tested on the dataset. Results: The derived RxWhyQA dataset contains 96,939 QA entries. Among the answerable questions, 25% require multiple answers, and 2% ask about multiple drugs within one question. There are frequent cues observed around the answers in the text, and 90% of the drug and reason terms occur within the same or an adjacent sentence. The baseline EQA solution achieved a best f1-measure of 0.72 on the entire dataset, and on specific subsets, it was: 0.93 on the unanswerable questions, 0.48 on single-drug questions versus 0.60 on multi-drug questions, 0.54 on the single-answer questions versus 0.43 on multi-answer questions. Discussion: The RxWhyQA dataset can be used to train and evaluate systems that need to handle multi-answer and multi-focus questions. Specifically, multi-answer EQA appears to be challenging and therefore warrants more investment in research.

翻译：提取问答( EQA) 是一种有用的自然语言处理( NLP) 应用程序, 用于通过在临床笔记中找到答案来回答患者特有的问题。现实的临床 EQA 可以对一个问题中的单个问题和多个焦点点有多重答案, 这些问题缺乏现有的用于开发人工智能解决方案的数据集。目标 : 为开发和评价临床 EQA系统创建一个数据集, 该系统可以处理自然的多答案和多重点问题。方法 : 我们利用了2018年国家NLP临床挑战( n2c2) 的附加说明关系来生成 EQA 数据集。具体来说, 1到 NM, M到 1, M- N和 M- N 药物关系, 在一个问题中, 一个多答案, 一个多答案是“ R- dalder ”, 一个“ R- drealder ”, 一个“ R- drealder ” 和“ R- dread- drealge ” 。因此, 一个最复杂和最难的解的答案是“ R- frental ” 。。在一个答案中, 一个常见的答案是“ R- dal- deal- deal- lad- deal- deal- la lad- lad- lad- lad- deal la la la la lat lad- lat lad- lat lat la lax lax lax lax lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad la lad lad lad lad lad las lad lad lad lad lad lad lads lad lads lad ” lad lad lad lad lad lad lad lad lad lad lad lad lad lads lad lad lad lad lad lad lad lad la

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日