采摘者: 巡回采矿、无标签遗漏和大规模硬负实例 (Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale)

Acquiring large-scale medical image data, necessary for training machine learning algorithms, is frequently intractable, due to prohibitive expert-driven annotation costs. Recent datasets extracted from hospital archives, e.g., DeepLesion, have begun to address this problem. However, these are often incompletely or noisily labeled, e.g., DeepLesion leaves over 50% of its lesions unlabeled. Thus, effective methods to harvest missing annotations are critical for continued progress in medical image analysis. This is the goal of our work, where we develop a powerful system to harvest missing lesions from the DeepLesion dataset at high precision. Accepting the need for some degree of expert labor to achieve high fidelity, we exploit a small fully-labeled subset of medical image volumes and use it to intelligently mine annotations from the remainder. To do this, we chain together a highly sensitive lesion proposal generator and a very selective lesion proposal classifier. While our framework is generic, we optimize our performance by proposing a 3D contextual lesion proposal generator and by using a multi-view multi-scale lesion proposal classifier. These produce harvested and hard-negative proposals, which we then re-use to finetune our proposal generator by using a novel hard negative suppression loss, continuing this process until no extra lesions are found. Extensive experimental analysis demonstrates that our method can harvest an additional 9,805 lesions while keeping precision above 90%. To demonstrate the benefits of our approach, we show that lesion detectors trained on our harvested lesions can significantly outperform the same variants only trained on the original annotations, with boost of average precision of 7% to 10%. We open source our annotations at https://github.com/JimmyCai91/DeepLesionAnnotation.

翻译：需要大规模医学图像数据,这是培训机器学习算法所必需的。由于专家驱动的注解成本高得令人望而却步,因此往往难以操作。最近从医院档案中提取的数据集,例如DeepLesion,已经开始解决这一问题。然而,这些数据往往被贴上不完全或有声的标签,例如,DeepLesion留下超过50%的损伤标签。因此,获取缺失说明的有效方法对于医学图像分析的持续进展至关重要。这是我们工作的目标,我们开发了一个强大的系统,从DeepLevel数据集中以高精确度采集缺失的值。我们接受某种程度的专家工作以达到高度忠诚的需要,例如DeepLesion等。我们开发了少数完全贴上标签的医疗图像数量,并用它来对其余部分进行明智的地雷说明。为了做到这一点,我们把一个高度敏感的腐蚀建议生成器和一个非常有选择性的腐蚀建议分类。我们的框架是通用的,我们只能通过提出3D背景变变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的精确的计算法。我们变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变。