Self-supervised representation learning has been extremely successful in medical image analysis, as it requires no human annotations to provide transferable representations for downstream tasks. Recent self-supervised learning methods are dominated by noise-contrastive estimation (NCE, also known as contrastive learning), which aims to learn invariant visual representations by contrasting one homogeneous image pair with a large number of heterogeneous image pairs in each training step. Nonetheless, NCE-based approaches still suffer from one major problem that is one homogeneous pair is not enough to extract robust and invariant semantic information. Inspired by the archetypical triplet loss, we propose GraVIS, which is specifically optimized for learning self-supervised features from dermatology images, to group homogeneous dermatology images while separating heterogeneous ones. In addition, a hardness-aware attention is introduced and incorporated to address the importance of homogeneous image views with similar appearance instead of those dissimilar homogeneous ones. GraVIS significantly outperforms its transfer learning and self-supervised learning counterparts in both lesion segmentation and disease classification tasks, sometimes by 5 percents under extremely limited supervision. More importantly, when equipped with the pre-trained weights provided by GraVIS, a single model could achieve better results than winners that heavily rely on ensemble strategies in the well-known ISIC 2017 challenge.
翻译:在医学图像分析方面,自我监督的代表学习非常成功,因为它不需要人为说明来为下游任务提供可转移的演示。最近自我监督的学习方法以噪声调估计(NCE,又称对比学习)为主,目的是通过对比一个同质图像配对和每个培训步骤中大量不同图像配对来学习不同视觉表现。然而,基于NCE的方法仍面临一个重大问题,即一对同质的配对不足以提取稳健和难以变异的语义信息。在古老的三重损失的启发下,我们建议GraVIS(GraVIS),它特别优化,用于学习皮肤图像中的自我监督特征,将同质的皮肤学图像分组,同时将异质相分离。此外,引入和融入了硬度认知关注,以解决相似的图像观点而不是相异的同质配对的重要性。GraVIS(GRAVI)明显超越了其在分解和疾病分类任务中的学习和自超异性学习对应方,有时由5%的人在极其有限的监督下学习。更重要的是,如果具备最有限的战略,那么重要的标准,那么,那么容易地依赖,那么,那么,那么,那么容易地依赖。