Deep Learning based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high resolution of histopathological images. Hence, self-supervised methods have been proposed to learn features using ad-hoc pretext tasks. The self-supervised training process is time consuming and often leads to subpar feature representation due to a lack of constrain on the learnt feature space, particularly prominent under data imbalance. In this work, we propose to actively sample the training set using a handful of labels and a small proxy network, decreasing sample requirement by 93% and training time by 99%.
翻译:基于深度学习的诊断系统在数字病理学中能够提供准确和稳定的定量分析。然而,由于组织学图像具有高分辨率特点,因此这些算法需要大量注释的训练数据实现是不实际的。因此,已经提出了自监督方法,通过采用先前不良操作任务来学习特征。自监督的训练过程耗时且由于特征空间的缺乏约束而经常导致次优的特征表征,尤其在数据不平衡情况下更为突出。在这项工作中,我们提出使用少量标签和小型代理网络主动采样训练集,使采样要求减少93%,训练时间减少了99%。