Whole slide images (WSIs) pose unique challenges when training deep learning models. They are very large which makes it necessary to break each image down into smaller patches for analysis, image features have to be extracted at multiple scales in order to capture both detail and context, and extreme class imbalances may exist. Significant progress has been made in the analysis of these images, thanks largely due to the availability of public annotated datasets. We postulate, however, that even if a method scores well on a challenge task, this success may not translate to good performance in a more clinically relevant workflow. Many datasets consist of image patches which may suffer from data curation bias; other datasets are only labelled at the whole slide level and the lack of annotations across an image may mask erroneous local predictions so long as the final decision is correct. In this paper, we outline the differences between patch or slide-level classification versus methods that need to localize or segment cancer accurately across the whole slide, and we experimentally verify that best practices differ in both cases. We apply a binary cancer detection network on post neoadjuvant therapy breast cancer WSIs to find the tumor bed outlining the extent of cancer, a task which requires sensitivity and precision across the whole slide. We extensively study multiple design choices and their effects on the outcome, including architectures and augmentations. Furthermore, we propose a negative data sampling strategy, which drastically reduces the false positive rate (7% on slide level) and improves each metric pertinent to our problem, with a 15% reduction in the error of tumor extent.
翻译:整个幻灯片图像( SISI) 在培训深层次学习模型时构成独特的挑战。 它们非常庞大, 使得有必要将每张图像分成小块进行分析, 图像特征必须在多个尺度上绘制, 以捕捉细节和背景, 极端阶级失衡可能存在。 在分析这些图像方面已经取得了显著进展, 主要是因为有公开附加说明的数据集。 然而, 我们假设, 即使方法在挑战性任务上得分很高, 这一成功可能不会转化为临床上更相关的工作流程的良好表现。 许多数据集包含可能受到数据整理偏差影响的图像补丁; 其他的数据集只标出整个幻灯片级别, 而图像上缺乏说明可能掩盖错误的本地预测, 以至于最终决定是正确的。 在本文件中, 我们概述了补丁或幻灯片等级分类与需要在整个幻灯片中准确定位或分层癌症的方法之间的差异, 我们实验性地证实两种情况下的最佳做法都不同。 我们用二进制癌症检测网络在乳腺癌后治疗中可能存在偏差; 其它的数据集只标只在整个幻灯片级别贴上贴标签, 并且要通过反复的缩略度来研究, 将癌症结果的精度排序。