Recent advances in deep learning have led to breakthroughs in the development of automated skin disease classification. As we observe an increasing interest in these models in the dermatology space, it is crucial to address aspects such as the robustness towards input data distribution shifts. Current skin disease models could make incorrect inferences for test samples from different hardware devices and clinical settings or unknown disease samples, which are out-of-distribution (OOD) from the training samples. To this end, we propose a simple yet effective approach that detect these OOD samples prior to making any decision. The detection is performed via scanning in the latent space representation (e.g., activations of the inner layers of any pre-trained skin disease classifier). The input samples could also perturbed to maximise divergence of OOD samples. We validate our ODD detection approach in two use cases: 1) identify samples collected from different protocols, and 2) detect samples from unknown disease classes. Additionally, we evaluate the performance of the proposed approach and compare it with other state-of-the-art methods. Furthermore, data-driven dermatology applications may deepen the disparity in clinical care across racial and ethnic groups since most datasets are reported to suffer from bias in skin tone distribution. Therefore, we also evaluate the fairness of these OOD detection methods across different skin tones. Our experiments resulted in competitive performance across multiple datasets in detecting OOD samples, which could be used (in the future) to design more effective transfer learning techniques prior to inferring on these samples.
翻译:最近深层学习的进展导致发展自动皮肤疾病分类方面的突破。随着我们看到对这些模型在皮肤空间中的兴趣日益增加,至关重要的是要解决诸如投入数据分布变化的稳健性等各方面的问题。当前皮肤疾病模型可以对不同硬件装置和临床环境或未知疾病样本进行不正确的推断,这些样本来自培训样本,在分配之外(OOOD),检测来自培训样本的未知疾病样本。为此目的,我们建议一种简单而有效的方法,在作出任何决定之前检测这些OOD样本。通过对潜在空间代表进行扫描(例如,任何事先训练过的皮肤疾病分类器内部层的激活)。输入样本还可以渗透,以最大限度地扩大OD样本的偏差。我们验证我们的ODD检测方法有两个用途:(1) 查明从不同协议中采集的样本,(2) 检测从未知疾病类别中的样本。此外,我们评估拟议方法的绩效,并将它与其他状态方法进行比较。此外,数据驱动的皮肤应用可能会加深临床护理中的差异,因为我们之前的皮肤检测方法的稳定性分析是我们以前使用的。