The emergence of vision-language models (VLMs) has opened new possibilities for clinical reasoning and has shown promising performance in dermatological diagnosis. However, their trustworthiness and clinical utility are often limited by three major factors: (1) Data heterogeneity, where diverse datasets lack consistent diagnostic labels and clinical concept annotations; (2) Absence of grounded diagnostic rationales, leading to a scarcity of reliable reasoning supervision; and (3) Limited scalability and generalization, as models trained on small, densely annotated datasets struggle to transfer nuanced reasoning to large, sparsely-annotated ones. To address these limitations, we propose SkinR1, a novel dermatological VLM that combines deep, textbook-based reasoning with the broad generalization capabilities of reinforcement learning (RL). SkinR1 systematically resolves the key challenges through a unified, end-to-end framework. First, we design a textbook-based reasoning generator that synthesizes high-fidelity, hierarchy-aware, and differential-diagnosis (DDx)-informed trajectories, providing reliable expert-level supervision. Second, we leverage the constructed trajectories for supervised fine-tuning (SFT) empowering the model with grounded reasoning ability. Third, we develop a novel RL paradigm that, by incorporating the hierarchical structure of diseases, effectively transfers these grounded reasoning patterns to large-scale, sparse data. Extensive experiments on multiple dermatology datasets demonstrate that SkinR1 achieves superior diagnostic accuracy. The ablation study demonstrates the importance of the reasoning foundation instilled by SFT.


翻译:视觉-语言模型(VLMs)的出现为临床推理开辟了新途径,并在皮肤科诊断中展现出有前景的性能。然而,其可信度与临床实用性常受限于三个主要因素:(1)数据异质性,即多样化的数据集缺乏一致的诊断标签与临床概念标注;(2)缺乏基于事实的诊断依据,导致可靠的推理监督稀缺;(3)可扩展性与泛化能力有限,因为基于小规模密集标注数据集训练的模型难以将精细推理迁移至大规模稀疏标注数据。为应对这些局限,我们提出了SkinR1,一种新型皮肤科VLM,它结合了基于教材的深度推理与强化学习(RL)的广泛泛化能力。SkinR1通过统一的端到端框架系统性地解决了这些关键挑战。首先,我们设计了一个基于教材的推理生成器,能够合成高保真、层次感知且融入鉴别诊断(DDx)信息的推理轨迹,提供可靠的专家级监督。其次,我们利用构建的轨迹进行监督微调(SFT),赋予模型基于事实的推理能力。第三,我们开发了一种新颖的RL范式,通过融入疾病的层次结构,有效地将这些基于事实的推理模式迁移至大规模稀疏数据。在多个皮肤科数据集上的广泛实验表明,SkinR1实现了卓越的诊断准确性。消融研究证明了SFT所奠定的推理基础的重要性。

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/
Top
微信扫码咨询专知VIP会员