大型语言模型在医疗保健中准备好了吗？对比研究临床语言理解能力 (Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding) - 专知论文

会员服务 ·

0

语言理解 · 大型语言模型 · 差分 · 关系抽取 · 语言模型 ·

2023 年 4 月 13 日

Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

翻译：大型语言模型在医疗保健中准备好了吗？对比研究临床语言理解能力

Yuqing Wang,Yun Zhao,Linda Petzold

from arxiv, 19 pages, preprint

Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance LLMs' performance by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation underscores the significance of task-specific learning strategies and prompting techniques for improving LLMs' effectiveness in healthcare-related tasks. Additionally, our in-depth error analysis on the challenging relation extraction task offers valuable insights into error distribution and potential avenues for improvement using SQP. Our study sheds light on the practical implications of employing LLMs in the specialized domain of healthcare, serving as a foundation for future research and the development of potential applications in healthcare settings.

翻译：大型语言模型（LLMs）在医疗保健等各个领域取得了显著进展。然而，临床语言理解任务的专业性质提出了独特的挑战和限制，需要进一步研究。在这项研究中，我们对当今最先进的LLMs，即GPT-3.5、GPT-4和Bard，在临床语言理解任务的范围内进行了全面评估。这些任务涵盖了各种各样的任务，包括命名实体识别、关系抽取、自然语言推理、语义文本相似度、文档分类和问题回答。我们还引入了一种新的提示策略，自问自答提示（SQP），旨在通过引导与所涉及的临床情境相关的信息性问题和答案，提高LLMs的性能。我们的评估强调了针对特定任务的学习策略和提示技术对于提高LLMs在医疗相关任务中的有效性的重要性。此外，我们针对具有挑战性的关系抽取任务进行了深入的误差分析，提供了有价值的误差分布见解，并探讨了使用SQP改进的潜在途径。我们的研究揭示了在医疗保健领域使用LLMs的实际意义，为未来的研究和在医疗保健环境中开发潜在应用奠定了基础。

0

相关内容

语言理解

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

专知会员服务

115+阅读 · 2023年3月24日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

94+阅读 · 2021年10月19日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

最新《弱监督预训练语言模型微调》报告，52页ppt

最新《弱监督预训练语言模型微调》报告，52页ppt

专知会员服务

38+阅读 · 2020年12月26日

使用Python进行医疗临床文本处理，37页ppt

使用Python进行医疗临床文本处理，37页ppt

专知会员服务

40+阅读 · 2020年8月5日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

专知会员服务

41+阅读 · 2020年2月26日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知

7+阅读 · 2022年10月19日

NLP领域最近比较火的Prompt，能否借鉴到多模态领域？一文跟进最新进展

NLP领域最近比较火的Prompt，能否借鉴到多模态领域？一文跟进最新进展

PaperWeekly

17+阅读 · 2022年3月8日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

中文自然语言处理开放任务介绍、数据集、当前最佳结果分享

中文自然语言处理开放任务介绍、数据集、当前最佳结果分享

深度学习与NLP

14+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

NLPprogress.com-随时跟进自然语言处理研究最新进展，34个NLP任务的数据、模型、论文与代码

NLPprogress.com-随时跟进自然语言处理研究最新进展，34个NLP任务的数据、模型、论文与代码

专知

12+阅读 · 2018年7月21日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

建立microRNA预测FOLFOX方案治疗晚期结直肠癌疗效模型的研究

国家自然科学基金

1+阅读 · 2014年12月31日

稀土三氢化物高压下的金属-绝缘体相变与超导相变研究

国家自然科学基金

0+阅读 · 2014年12月31日

压力下掺杂石墨烯超导电性的第一原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

个体化医学中生物标记物预测能力的估计和推断

国家自然科学基金

2+阅读 · 2013年12月31日

c-Src促进鼻咽癌转移的作用与临床意义的研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

体外循环下心脏空搏技术改善瓣膜外科肥厚心肌保护的价值和机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

抗MUC-1单克隆抗体修饰的多功能纳米载体用于卵巢癌靶向治疗的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

BsMAb预定位技术提高MR分子成像敏感性的可行性研究

国家自然科学基金

0+阅读 · 2009年12月31日

新候选癌基因EIF-5A2在结直肠癌浸润转移中的分子致癌特征及其临床肿瘤学意义的研究

国家自然科学基金

0+阅读 · 2009年12月31日

ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models

Arxiv

0+阅读 · 2023年5月31日

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Arxiv

0+阅读 · 2023年5月30日

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Do Large Language Models Know What They Don't Know?

Arxiv

0+阅读 · 2023年5月30日

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Arxiv

0+阅读 · 2023年5月29日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

487+阅读 · 2023年3月31日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

大型语言模型

相关VIP内容

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

专知会员服务

115+阅读 · 2023年3月24日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

94+阅读 · 2021年10月19日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

最新《弱监督预训练语言模型微调》报告，52页ppt

最新《弱监督预训练语言模型微调》报告，52页ppt

专知会员服务

38+阅读 · 2020年12月26日

使用Python进行医疗临床文本处理，37页ppt

使用Python进行医疗临床文本处理，37页ppt

专知会员服务

40+阅读 · 2020年8月5日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

专知会员服务

41+阅读 · 2020年2月26日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向可扩展深度神经网络的预测编码：理论与实践

如何快速获取数百万架无人机？

EMNLP 2025 | RTQA：递归思想求解复杂的时间知识图谱问答

组合式零样本学习综述

相关资讯

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知

7+阅读 · 2022年10月19日

NLP领域最近比较火的Prompt，能否借鉴到多模态领域？一文跟进最新进展

NLP领域最近比较火的Prompt，能否借鉴到多模态领域？一文跟进最新进展

PaperWeekly

17+阅读 · 2022年3月8日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

中文自然语言处理开放任务介绍、数据集、当前最佳结果分享

中文自然语言处理开放任务介绍、数据集、当前最佳结果分享

深度学习与NLP

14+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

NLPprogress.com-随时跟进自然语言处理研究最新进展，34个NLP任务的数据、模型、论文与代码

NLPprogress.com-随时跟进自然语言处理研究最新进展，34个NLP任务的数据、模型、论文与代码

专知

12+阅读 · 2018年7月21日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

相关论文

ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models

Arxiv

0+阅读 · 2023年5月31日

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Arxiv

0+阅读 · 2023年5月30日

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Do Large Language Models Know What They Don't Know?

Arxiv

0+阅读 · 2023年5月30日

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Arxiv

0+阅读 · 2023年5月29日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

487+阅读 · 2023年3月31日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

相关基金

建立microRNA预测FOLFOX方案治疗晚期结直肠癌疗效模型的研究

国家自然科学基金

1+阅读 · 2014年12月31日

稀土三氢化物高压下的金属-绝缘体相变与超导相变研究

国家自然科学基金

0+阅读 · 2014年12月31日

压力下掺杂石墨烯超导电性的第一原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

个体化医学中生物标记物预测能力的估计和推断

国家自然科学基金

2+阅读 · 2013年12月31日

c-Src促进鼻咽癌转移的作用与临床意义的研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

体外循环下心脏空搏技术改善瓣膜外科肥厚心肌保护的价值和机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

抗MUC-1单克隆抗体修饰的多功能纳米载体用于卵巢癌靶向治疗的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

BsMAb预定位技术提高MR分子成像敏感性的可行性研究

国家自然科学基金

0+阅读 · 2009年12月31日

新候选癌基因EIF-5A2在结直肠癌浸润转移中的分子致癌特征及其临床肿瘤学意义的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员