The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.
翻译:卷积神经网络在泛化到以前未见过的领域的图像方面的能力受到了限制,尤其是对于诸如皮肤镜皮肤癌分类之类的安全关键的临床任务而言。为了将基于CNN的应用程序转化为临床实践,它们必须能够适应领域转换。这种新条件可能通过使用不同的图像采集系统或不同的光照条件而产生。在皮肤镜检查中,领域转换也可能发生在患者年龄或罕见病变部位(例如手掌)的变化中。这些在大多数训练数据集中并不明显,因此可能会导致性能下降。为了验证分类模型在临床实践中的通用性,具有模拟这种领域转换的数据访问是至关重要的。据我们所知,目前不存在皮肤镜图像数据集,其中这些领域转换得到了恰当的描述和定量。因此,我们基于ISIC存档中的可公开获取的图像按元数据(例如采集位置、病变局部化、患者年龄)将其分组,以生成有意义的领域。为了验证这些领域的确是不同的,我们使用多种量化措施来估计领域转换的存在和强度。此外,我们分析了这些领域上使用有监督域适应技术的性能。我们观察到,在我们的绝大多数分组领域中,确实存在领域转换。基于我们的结果,我们认为这些数据集对于测试皮肤镜皮肤癌分类器的通用性能力是有帮助的。