The performance of learning-based algorithms improves with the amount of labelled data used for training. Yet, manually annotating data can be tedious and expensive, especially in medical image segmentation. To reduce manual labelling, active learning (AL) targets the most informative samples from the unlabelled set to annotate and add to the labelled training set. On one hand, most active learning works have focused on the classification or limited segmentation of natural images, despite active learning being highly desirable in the difficult task of medical image segmentation. On the other hand, uncertainty-based AL approaches notoriously offer sub-optimal batch-query strategies, while diversity-based methods tend to be computationally expensive. Over and above methodological hurdles, random sampling has proven an extremely difficult baseline to outperform when varying learning and sampling conditions. This work aims to take advantage of the diversity and speed offered by random sampling to improve the selection of uncertainty-based AL methods for segmenting medical images. More specifically, we propose to compute uncertainty at the level of batches instead of samples through an original use of stochastic batches during sampling in AL. Exhaustive experiments on medical image segmentation, with an illustration on MRI prostate imaging, show that the benefits of stochastic batches during sample selection are robust to a variety of changes in the training and sampling procedures.
翻译:以学习为基础的算法的性能随着用于培训的贴标签数据的数量而得到改善。然而,人工说明数据可能会是繁琐而昂贵的,特别是在医学图像分割方面。为减少人工标签,积极学习(AL)针对的是从未贴标签的数据集到注释的最为丰富的信息样本,并添加到标签的培训组。一方面,大多数积极的学习工作都侧重于自然图像的分类或有限的分化,尽管积极学习在医学图像分割这一艰巨任务中非常可取。另一方面,基于不确定性的AL方法臭名昭著地提供亚最佳成份战略,而基于多样性的方法往往在计算上昂贵。除了方法上的障碍外,随机抽样已经证明在不同的学习和抽样条件下,极难于超越一个比标准。这项工作的目的是利用随机抽样提供的多样性和速度来改进基于不确定性的医学图像分解方法的选择。更具体地说,我们提议通过在AL取样过程中最初使用分批的分批方法来计算不确定性,而基于多样性的方法往往是计算成本的。在AL的取样过程中,随机抽样抽样测试已经证明了精确的样本选择程序。在SRimal的样本选择过程中,在样品选择中,对成型的样品进行精准性分析的实验展示。