将共同的不确定性估算方法与按域变换和标签噪音进行的病理图象和标签噪音基准化 (Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise)

In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.

翻译：过去几年来,深层学习发现,在组织病理学应用领域,使用感化病理学应用的不确定性和稳健性方法有所增加,然而,虽然这些方法显示出巨大的潜力,但在高风险环境中,深层学习模型需要能够判断自己的不确定性,并在出现重大分类错误的可能性时能够拒绝投入。在这项工作中,我们利用H ⁇ E染色色色色色色色色色色色色色色色色色17 乳腺癌数据集,对域变色下全流中最常用的不确定性和稳健性方法进行了严格的评估。虽然已知组织病理学数据可能受到强烈的域域变换和标签噪音的干扰,但据我们所知,深层学习模型模型模型需要能够比较这些方面最常见的不确定性估算方法。我们在实验中比较Sto Indicistic Variational Inditionalational Inditionalence、Monte-Carloute Enculation、Test-Time Data Augationationationservolve lating and the Test-deal devolutional lating the we deview lating the laview lating ladeal lating lating the lating the laview lader the laview lade lader lating lating lating the lade lader lader lating latings lating lating lating lating latings 在选择,在选择中,在选择了一种适当的方法中,我们的变化方法,我们的变色度上,我们的变色度,我们比较了一种最难度数据的标签。我们的标签。我们用的方法,我们用的方法,我们用的方法可以比较了。