电子健康记录文本到SQL的实际基准测试：EHRSQL (EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records)

We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff, including physicians, nurses, insurance review and health records teams, and more. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and templatized the responses to create seed questions. Then, we manually linked them to two open-source EHR databases, MIMIC-III and eICU, and included them with various time expressions and held-out unanswerable questions in the dataset, which were all collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable based on the prediction confidence. We believe our dataset, EHRSQL, could serve as a practical benchmark to develop and assess QA models on structured EHR data and take one step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.

翻译：我们为电子病历（EHRs）提供了一个新的文本到SQL数据集，其中所述话语来自222位医院工作人员，包括医生、护士、保险审核和健康记录小组等。为了构建结构化EHR数据的QA数据集，我们在一家大学医院进行了一项在线调查，并将回答模板化，以创建种子问题。然后，我们通过手动链接两个开源的EHR数据库（MIMIC-III和eICU），并将各种时间表达式和保留的无法回答的问题包含在数据集中，这些问题都是从在线调查中收集的。我们的数据集提出了一组独特的挑战：模型需要生成SQL查询，以反映医院中各种需求，包括简单检索和复杂的操作，如计算存活率；理解各种时间表达式，以回答医疗保健中的时间敏感问题；根据预测置信度区分给定问题是否可回答。我们相信，EHRSQL数据集可以作为一个实际的基准测试，开发和评估QA模型在结构化EHR数据上，并进一步向在医疗保健领域中实施文本到SQL研究的实际应用迈出一步。EHRSQL可在https://github.com/glee4810/EHRSQL上获得。