韩国传统医学中大型语言模型的潜力探索：面向文化适应性医疗的基础模型方法 (Exploring the Potential of Large Language models in Traditional Korean Medicine: A Foundation Model Approach to Culturally-Adapted Healthcare)

Introduction: Traditional Korean medicine (TKM) emphasizes individualized diagnosis and treatment, making AI modeling difficult due to limited data and implicit processes. GPT-3.5 and GPT-4, large language models, have shown impressive medical knowledge despite lacking medicine-specific training. This study aimed to assess the capabilities of GPT-3.5 and GPT-4 for TKM using the Korean National Licensing Examination for Korean Medicine Doctors. Methods: GPT-3.5 (February 2023) and GPT-4 (March 2023) models answered 340 questions from the 2022 examination across 12 subjects. Each question was independently evaluated five times in an initialized session. Results: GPT-3.5 and GPT-4 achieved 42.06% and 57.29% accuracy, respectively, with GPT-4 nearing passing performance. There were significant differences in accuracy by subjects, with 83.75% accuracy for neuropsychiatry compared to 28.75% for internal medicine (2). Both models showed high accuracy in recall-based and diagnosis-based questions but struggled with intervention-based ones. The accuracy for questions that require TKM-specialized knowledge was relatively lower than the accuracy for questions that do not GPT-4 showed high accuracy for table-based questions, and both models demonstrated consistent responses. A positive correlation between consistency and accuracy was observed. Conclusion: Models in this study showed near-passing performance in decision-making for TKM without domain-specific training. However, limits were also observed that were believed to be caused by culturally-biased learning. Our study suggests that foundation models have potential in culturally-adapted medicine, specifically TKM, for clinical assistance, medical education, and medical research.

翻译：引言：韩国传统医学（TKM）强调个体化诊断和治疗，使得AI建模由于数据有限和隐式过程而变得困难。GPT-3.5和GPT-4是大型语言模型，在缺乏医学特定培训的情况下显示出令人印象深刻的医学知识。本研究旨在评估GPT-3.5和GPT-4用于TKM的能力，使用韩国国家执照考试 ( Korean National Licensing Examination for Korean Medicine Doctors )。方法：GPT-3.5 （2023年2月）和 GPT-4 （2023年3月）模型回答了2022年考试的12个科目中的340道题目。每道题目在一个初始化的会话中独立评估五次。结果：GPT-3.5和GPT-4分别达到了42.06%和57.29%的准确率，GPT-4接近及格的表现。不同科目的准确率存在显著差异，神经精神科的准确率为83.75%，内科的准确率为28.75%（2）。两个模型在基于回忆和基于诊断的问题上都表现出高准确度，但在基于干预的问题上表现出困难。需要TKM专业知识的问题相对于不需要TKM专业知识的问题准确率较低，而GPT-4在基于表格的问题上表现出很高的准确率，两个模型都展现了一致的响应。一个一致性和准确性之间的正相关性被观察到。结论：本研究中的模型在没有领域特定培训的情况下，显示出在TKM决策方面几乎及格的表现。然而，也观察到由于文化偏见学习而导致的限制。我们的研究表明，基础模型在文化适应性医学，特别是TKM领域，为临床援助、医学教育和医学研究具有潜力。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

大模型如何构建“医生级”问答？谷歌DeepMInd最新《基于大型语言模型的专家级医疗问答研究》论文，提出Med-PaLM 2

专知会员服务

65+阅读 · 2023年5月21日

大模型如何适用长尾或特定领域？微软等提出《参数知识引导的增强大型语言模型》，扩展LLM的垂直化长尾适配能力

专知会员服务

87+阅读 · 2023年5月10日

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

专知会员服务

115+阅读 · 2023年3月24日

Nature Medicine | 多模态的生物医学AI

专知会员服务

31+阅读 · 2022年9月25日