OpenAI的GPT从GPT-4到GPT-3.5："拿起我的手术刀"——对开放AI的GPT在整形外科住院医师培训考试中的能力进行考察 (GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam)

The Plastic Surgery In-Service Training Exam (PSITE) is an important indicator of resident proficiency and serves as a useful benchmark for evaluating OpenAI's GPT. Unlike many of the simulated tests or practice questions shown in the GPT-4 Technical Paper, the multiple-choice questions evaluated here are authentic PSITE questions. These questions offer realistic clinical vignettes that a plastic surgeon commonly encounters in practice and scores highly correlate with passing the written boards required to become a Board Certified Plastic Surgeon. Our evaluation shows dramatic improvement of GPT-4 (without vision) over GPT-3.5 with both the 2022 and 2021 exams respectively increasing the score from 8th to 88th percentile and 3rd to 99th percentile. The final results of the 2023 PSITE are set to be released on April 11, 2023, and this is an exciting moment to continue our research with a fresh exam. Our evaluation pipeline is ready for the moment that the exam is released so long as we have access via OpenAI to the GPT-4 API. With multimodal input, we may achieve superhuman performance on the 2023.

翻译：整形外科住院医师培训考试(PSITE)是评估住院医师熟练度的重要指标，也是评估OpenAI的GPT的有用基准。与GPT-4技术论文中展示的许多模拟测试或练习题不同，这里评估的多项选择题是真实的PSITE试题。这些问题提供了一个整形外科医生在实践中普遍遇到的逼真临床案例，并且得分高度相关于通过成为经过认证的整形外科医生所需的书面考试。我们的评估显示，GPT-4 (没有视觉)比GPT-3.5有了显著提高，在2022年和2021年的考试中分别将得分从第8个百分位提高到了第88个百分位，从第3个百分位提高到了第99个百分位。2023年的PSITE的最终成绩将于2023年4月11日公布，这是继续我们的研究的一个激动人心的时刻。只要我们可以通过OpenAI接口获取GPT-4的API，我们就已经准备好了评估流程。通过多模态输入，我们可能在2023上实现超人类的表现。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日