Deep learning models with large learning capacities often overfit to medical imaging datasets. This is because training sets are often relatively small due to the significant time and financial costs incurred in medical data acquisition and labelling. Data augmentation is therefore often used to expand the availability of training data and to increase generalization. However, augmentation strategies are often chosen on an ad-hoc basis without justification. In this paper, we present an augmentation policy search method with the goal of improving model classification performance. We include in the augmentation policy search additional transformations that are often used in medical image analysis and evaluate their performance. In addition, we extend the augmentation policy search to include non-linear mixed-example data augmentation strategies. Using these learned policies, we show that principled data augmentation for medical image model training can lead to significant improvements in ultrasound standard plane detection, with an an average F1-score improvement of 7.0% overall over naive data augmentation strategies in ultrasound fetal standard plane classification. We find that the learned representations of ultrasound images are better clustered and defined with optimized data augmentation.
翻译:这是因为,由于医疗数据获取和标签方面花费了大量时间和财政费用,因此,增强数据常常被用来扩大培训数据的提供范围,并增加一般化;然而,增强能力战略往往是在临时情况下选择的,而没有正当理由;在本文件中,我们提出了一个增强政策搜索方法,目的是改进模型分类性能;我们在扩大政策搜索中包括了在医学图像分析中经常使用的额外转换,并评估其性能;此外,我们扩大了扩大政策搜索范围,以包括非线性混合数据增强战略。我们利用这些所学政策,表明医疗图像模型培训的有原则的数据增强可导致超声波标准平面探测的重大改进,在超声波金属标准平面分类中,比天性数据增强战略的总体改善7.0%。我们发现,超声波图像所学表现更加集中,并且以优化的数据增强方式加以界定。