EHRSQL: 电子健康记录实用文本到SQL基准 (EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records) - 专知论文

会员服务 ·

0

数据集 · 自动问答 · MoDELS · HTTPS · 可理解性 ·

2023 年 1 月 16 日

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

翻译：EHRSQL: 电子健康记录实用文本到SQL基准

Gyubok Lee,Hyeonji Hwang,Seongsu Bae,Yeonsu Kwon,Woncheol Shin,Seongjun Yang,Minjoon Seo,Jong-Yeup Kim,Edward Choi

from arxiv, Published as a conference paper at NeurIPS 2022 (Track on Datasets and Benchmarks)

We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff, including physicians, nurses, insurance review and health records teams, and more. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and templatized the responses to create seed questions. Then, we manually linked them to two open-source EHR databases, MIMIC-III and eICU, and included them with various time expressions and held-out unanswerable questions in the dataset, which were all collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable based on the prediction confidence. We believe our dataset, EHRSQL, could serve as a practical benchmark to develop and assess QA models on structured EHR data and take one step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.

翻译：我们为电子健康记录(EHRs)提供了一个新的文本到SQL数据集。发言来自222名医院工作人员,包括医生、护士、保险审查和健康记录小组等222名医院工作人员,收集了一套独有的挑战:模型需求:1)生成了反映医院广泛需要的SQL查询,包括简单的检索和复杂的操作,如计算生存率,2)了解在保健中回答时间敏感问题的各种时间表达方式,3)根据预测信心区分一个问题是否可以回答或无法回答。我们认为,我们的数据集,即EHRSQL,可以作为开发和评估EHRA结构化数据模型的实用基准,并且进一步迈出了在ERHR-QRQ/QRQQ上构建和在EHR-QRQ/HR-QQ/QRGRM-Q的文本研究之间缩小差距的一步。

0

相关内容

数据集

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

PaperWeekly

1+阅读 · 2022年3月31日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

多功能无机杂化功能分子的制备与性能

国家自然科学基金

0+阅读 · 2014年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

微纳光子器件高通量与高保真光耦合原理与技术

国家自然科学基金

0+阅读 · 2014年12月31日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

LED植物灯用磷灰石结构氧化物荧光材料的制备和光学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

离子注入制备BiFeO3/ZnO/graphene多铁性器件

国家自然科学基金

0+阅读 · 2012年12月31日

La1-xSrxMnO3/In-MgZnO全氧化物外延异质结器件的制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fe-Ga合金中3d电子、超精细作用及其磁致伸缩效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

前列腺癌相关生物模块和通路的数据库构建及其功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

典型材料多尺度耦合电磁模型与算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients

Arxiv

0+阅读 · 2023年3月11日

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study

Arxiv

0+阅读 · 2023年3月11日

New Benchmarks for Accountable Text-based Visual Re-creation

Arxiv

0+阅读 · 2023年3月10日

Paraphrasing Techniques for Maritime QA system

Arxiv

0+阅读 · 2023年3月9日

Bayesian estimation methods for survey data with applications for health disparities research

Arxiv

1+阅读 · 2023年3月9日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

PaperWeekly

1+阅读 · 2022年3月31日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients

Arxiv

0+阅读 · 2023年3月11日

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study

Arxiv

0+阅读 · 2023年3月11日

New Benchmarks for Accountable Text-based Visual Re-creation

Arxiv

0+阅读 · 2023年3月10日

Paraphrasing Techniques for Maritime QA system

Arxiv

0+阅读 · 2023年3月9日

Bayesian estimation methods for survey data with applications for health disparities research

Arxiv

1+阅读 · 2023年3月9日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Which Knowledge Graph Is Best for Me?

Arxiv

11+阅读 · 2018年9月28日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

相关基金

多功能无机杂化功能分子的制备与性能

国家自然科学基金

0+阅读 · 2014年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

微纳光子器件高通量与高保真光耦合原理与技术

国家自然科学基金

0+阅读 · 2014年12月31日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

LED植物灯用磷灰石结构氧化物荧光材料的制备和光学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

离子注入制备BiFeO3/ZnO/graphene多铁性器件

国家自然科学基金

0+阅读 · 2012年12月31日

La1-xSrxMnO3/In-MgZnO全氧化物外延异质结器件的制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fe-Ga合金中3d电子、超精细作用及其磁致伸缩效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

前列腺癌相关生物模块和通路的数据库构建及其功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

典型材料多尺度耦合电磁模型与算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员