Recent advances in dermatological image analysis have been driven by large-scale annotated datasets; however, most existing benchmarks focus on dermatoscopic images and lack patient-authored queries and clinical context, limiting their applicability to patient-centered care. To address this gap, we introduce DermaVQA-DAS, an extension of the DermaVQA dataset that supports two complementary tasks: closed-ended question answering (QA) and dermatological lesion segmentation. Central to this work is the Dermatology Assessment Schema (DAS), a novel expert-developed framework that systematically captures clinically meaningful dermatological features in a structured and standardized form. DAS comprises 36 high-level and 27 fine-grained assessment questions, with multiple-choice options in English and Chinese. Leveraging DAS, we provide expert-annotated datasets for both closed QA and segmentation and benchmark state-of-the-art multimodal models. For segmentation, we evaluate multiple prompting strategies and show that prompt design impacts performance: the default prompt achieves the best results under Mean-of-Max and Mean-of-Mean evaluation aggregation schemes, while an augmented prompt incorporating both patient query title and content yields the highest performance under majority-vote-based microscore evaluation, achieving a Jaccard index of 0.395 and a Dice score of 0.566 with BiomedParse. For closed-ended QA, overall performance is strong across models, with average accuracies ranging from 0.729 to 0.798; o3 achieves the best overall accuracy (0.798), closely followed by GPT-4.1 (0.796), while Gemini-1.5-Pro shows competitive performance within the Gemini family (0.783). We publicly release DermaVQA-DAS, the DAS schema, and evaluation protocols to support and accelerate future research in patient-centered dermatological vision-language modeling (https://osf.io/72rp3).
翻译:皮肤病图像分析的最新进展主要由大规模标注数据集推动;然而,现有基准大多聚焦于皮肤镜图像,缺乏患者自述的查询与临床背景,限制了其在以患者为中心的护理中的适用性。为填补这一空白,我们引入了DermaVQA-DAS,这是DermaVQA数据集的扩展,支持两项互补任务:封闭式问答(QA)与皮肤病皮损分割。本工作的核心是皮肤病评估框架(DAS),这是一个由专家开发的新型框架,能以结构化和标准化的形式系统性地捕捉具有临床意义的皮肤病学特征。DAS包含36个高层次评估问题和27个细粒度评估问题,并提供英文和中文的多项选择选项。基于DAS,我们为封闭式QA和分割任务提供了专家标注的数据集,并对最先进的多模态模型进行了基准测试。对于分割任务,我们评估了多种提示策略,结果表明提示设计影响性能:在“最大值平均”和“平均值平均”评估聚合方案下,默认提示取得了最佳结果;而在基于多数投票的微观评分评估下,融合了患者查询标题和内容的增强提示实现了最高性能,使用BiomedParse模型获得了0.395的Jaccard指数和0.566的Dice分数。对于封闭式QA,各模型整体表现强劲,平均准确率介于0.729至0.798之间;o3模型取得了最佳整体准确率(0.798),紧随其后的是GPT-4.1(0.796),而Gemini-1.5-Pro在Gemini系列模型中表现出竞争力(0.783)。我们公开发布了DermaVQA-DAS、DAS框架及评估协议,以支持并加速未来在以患者为中心的皮肤病视觉-语言建模方面的研究(https://osf.io/72rp3)。