This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from a publicly available Electronic Health Record (EHR). EHRs contain patient records, stored in structured tables and unstructured clinical notes. The information in structured and unstructured EHRs is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. Our dataset has medication-related queries, containing over 70,000 question-answer pairs. To provide a baseline model and help analyze the dataset, we have used a simple model (MultimodalEHRQA) which uses the predictions of a modality selection network to choose between EHR tables and clinical notes to answer the questions. This is used to direct the questions to the table-based or text-based state-of-the-art QA model. In order to address the problem arising from complex, nested queries, this is the first time Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (RAT-SQL) has been used to test the structure of query templates in EHR data. Our goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.
翻译:本文开发了第一个解答数据集(DrugEHRQA)的问答配对(DrugEHRQA), 其中包括结构化表格的问答配对, 以及公开提供的电子健康记录(EHR)的非结构化注释的解答调。 电子健康记录包含病人记录, 储存在结构化表格和非结构化临床注释中。 结构化和非结构化 EHR中的信息并非完全脱节: 信息可能重复、 相互矛盾, 或提供这些来源之间的更多背景。 我们的数据集有药物相关查询, 包含70,000多个问答配对。 为了提供一个基线模型并帮助分析数据集, 我们使用一个简单的模型( MultimadalEHRQA), 使用模式选择模式选择网络的预测, 来选择在结构化表格表格和基于文本的QA 样板中, 将问题引导到基于表格的或基于文本的QA 的QA 样板模型中。 为了解决复杂、 嵌套式查询引起的问题, 这是第一次使用 Relation-A Eema-A Equal Erma Erum 数据库的文本- deal deal deal deal deal deal rodustrutal dredustrutal dre drodudududududustrutal 数据结构, 一种用于测试了我们用于用于在不开式的快速数据结构的数据路的系统。