In this paper, we propose a novel approach (called GPT4MIA) that utilizes Generative Pre-trained Transformer (GPT) as a plug-and-play transductive inference tool for medical image analysis (MIA). We provide theoretical analysis on why a large pre-trained language model such as GPT-3 can be used as a plug-and-play transductive inference model for MIA. At the methodological level, we develop several technical treatments to improve the efficiency and effectiveness of GPT4MIA, including better prompt structure design, sample selection, and prompt ordering of representative samples/features. We present two concrete use cases (with workflow) of GPT4MIA: (1) detecting prediction errors and (2) improving prediction accuracy, working in conjecture with well-established vision-based models for image classification (e.g., ResNet). Experiments validate that our proposed method is effective for these two tasks. We further discuss the opportunities and challenges in utilizing Transformer-based large language models for broader MIA applications.
翻译:在本文中,我们提出了一个新颖的方法(称为GPT4MIA),即利用GPT-训练前变压器(GPT)作为医学图像分析的插件和插件转导推导工具(MIA),我们提供了理论分析,说明为何可以使用GPT-3等大型预先训练语言模型作为MIA的插件和插件转导推导模型。在方法层面,我们开发了几种技术处理方法,以提高GPT4MIA的效率和效力,包括更迅速的结构设计、抽样选择和迅速订购具有代表性的样本/功能。我们介绍了GPT4MIA的两个具体使用案例(有工作流程):(1) 检测预测错误和(2) 改进预测准确性,与基于愿景的图像分类模型(例如ResNet)进行预测。实验证实我们所提议的方法对这两项任务有效。我们进一步讨论了利用基于变压器的大语言模型用于更广泛的MIA应用的机会和挑战。</s>