Which volume to annotate next is a challenging problem in building medical imaging datasets for deep learning. One of the promising methods to approach this question is active learning (AL). However, AL has been a hard nut to crack in terms of which AL algorithm and acquisition functions are most useful for which datasets. Also, the problem is exacerbated with which volumes to label first when there is zero labeled data to start with. This is known as the cold start problem in AL. We propose two novel strategies for AL specifically for 3D image segmentation. First, we tackle the cold start problem by proposing a proxy task and then utilizing uncertainty generated from the proxy task to rank the unlabeled data to be annotated. Second, we craft a two-stage learning framework for each active iteration where the unlabeled data is also used in the second stage as a semi-supervised fine-tuning strategy. We show the promise of our approach on two well-known large public datasets from medical segmentation decathlon. The results indicate that the initial selection of data and semi-supervised framework both showed significant improvement for several AL strategies.
翻译:在为深层学习建立医学成像数据集方面,下一个批量是一个具有挑战性的问题。 解决这一问题的一个有希望的方法是积极学习(AL)。 然而, AL是一个难以破解的难题, AL算法和获取功能对于这些数据集最有用。 另外, 当没有标签的数据需要开始的时候, 该批量的标签会更趋严重。 这被称为AL 中的冷点启动问题 。 我们为 AL 专门为 3D 图像分割提出了两个新颖的战略 。 首先, 我们通过提出代理任务来解决冷点启动问题, 然后利用从代理任务中产生的不确定性来将未标出的数据排序为注释。 其次, 我们为每个活动循环设计了一个两阶段学习框架, 在第二个阶段, 未标出的数据也被用作半监督的微调战略。 我们展示了我们从医疗分解 Dathlon 获得的两个众所周知的大型公共数据集的希望。 结果表明, 最初选择的数据和半超标框架对AL 几个战略都有显著的改进 。