皮肤镜皮肤癌数据集的领域转换：评估临床转化的基本限制 (Domain shifts in dermoscopic skin cancer datasets: Evaluation of essential limitations for clinical translation)

The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.

翻译：卷积神经网络在泛化到以前未见过的领域的图像方面的能力受到了限制，尤其是对于诸如皮肤镜皮肤癌分类之类的安全关键的临床任务而言。为了将基于CNN的应用程序转化为临床实践，它们必须能够适应领域转换。这种新条件可能通过使用不同的图像采集系统或不同的光照条件而产生。在皮肤镜检查中，领域转换也可能发生在患者年龄或罕见病变部位（例如手掌）的变化中。这些在大多数训练数据集中并不明显，因此可能会导致性能下降。为了验证分类模型在临床实践中的通用性，具有模拟这种领域转换的数据访问是至关重要的。据我们所知，目前不存在皮肤镜图像数据集，其中这些领域转换得到了恰当的描述和定量。因此，我们基于ISIC存档中的可公开获取的图像按元数据（例如采集位置、病变局部化、患者年龄）将其分组，以生成有意义的领域。为了验证这些领域的确是不同的，我们使用多种量化措施来估计领域转换的存在和强度。此外，我们分析了这些领域上使用有监督域适应技术的性能。我们观察到，在我们的绝大多数分组领域中，确实存在领域转换。基于我们的结果，我们认为这些数据集对于测试皮肤镜皮肤癌分类器的通用性能力是有帮助的。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

专知会员服务

57+阅读 · 2021年7月27日