The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.
翻译:卷积神经网络很难泛化到以前未见过的领域的图像,特别是对于皮肤镜皮肤癌分类等安全关键的临床任务而言,这是一个重要限制。为了将基于CNN的应用程序转化到临床,必须使它们能够适应领域漂移。这种新条件可能会通过使用不同的图像获取系统或不同的照明条件出现。在皮肤镜检测中,变化也会发生在患者年龄或罕见病变定位(例如手掌)的变化上。这在大多数训练数据集中表示不明显,因此可能导致其性能下降。为了验证分类模型在现实临床环境下的泛化能力,有必要访问模拟这种领域漂移的数据。据我们所知,不存在一个皮肤镜图像数据集,其中这种领域漂移被适当地描述和量化。因此,我们根据公开可用的ISIC档案中的图像元数据(例如获取位置、病变定位、患者年龄)对其进行分类,以生成有意义的领域。为了验证这些领域是否实际上是不同的,我们使用了多种量化措施来估计领域漂移的存在和强度。此外,我们分析了在这些领域上使用无监督领域适应技术时的表现。我们观察到,在我们划分的大多数领域中,实际上存在领域漂移。基于我们的结果,我们认为这些数据集有助于测试皮肤镜皮肤癌分类器的泛化能力。