Pathological image analysis is an important process for detecting abnormalities such as cancer from cell images. However, since the image size is generally very large, the cost of providing detailed annotations is high, which makes it difficult to apply machine learning techniques. One way to improve the performance of identifying abnormalities while keeping the annotation cost low is to use only labels for each slide, or to use information from another dataset that has already been labeled. However, such weak supervisory information often does not provide sufficient performance. In this paper, we proposed a new task setting to improve the classification performance of the target dataset without increasing annotation costs. And to solve this problem, we propose a pipeline that uses multiple instance learning (MIL) and domain adaptation (DA) methods. Furthermore, in order to combine the supervisory information of both methods effectively, we propose a method to create pseudo-labels with high confidence. We conducted experiments on the pathological image dataset we created for this study and showed that the proposed method significantly improves the classification performance compared to existing methods.
翻译:病理图像分析是从细胞图像中检测癌症等异常的重要过程。然而,由于图像大小通常非常大,提供详细注释的成本很高,这使得应用机器学习技术变得困难。提高识别异常的性能同时保持注释成本较低的一种方法是仅使用每个幻灯片的标签,或使用已经标记的另一个数据集中的信息。然而,这种弱监督信息通常不提供足够的性能。在本文中,我们提出了一种新的任务设置,以在不增加注释成本的情况下提高目标数据集的分类性能。为了解决这个问题,我们提出了一种使用多实例学习(MIL)和领域自适应(DA)方法的流水线。此外,为了有效地组合两种方法的监督信息,我们提出了一种使用高置信度的伪标签的方法。我们在我们为这项研究创建的病理图像数据集上进行了实验,并显示了所提出的方法相比现有方法显著提高了分类性能。