Automated data augmentation, which aims at engineering augmentation policy automatically, recently draw a growing research interest. Many previous auto-augmentation methods utilized a Density Matching strategy by evaluating policies in terms of the test-time augmentation performance. In this paper, we theoretically and empirically demonstrated the inconsistency between the train and validation set of small-scale medical image datasets, referred to as in-domain sampling bias. Next, we demonstrated that the in-domain sampling bias might cause the inefficiency of Density Matching. To address the problem, an improved augmentation search strategy, named Augmented Density Matching, was proposed by randomly sampling policies from a prior distribution for training. Moreover, an efficient automatical machine learning(AutoML) algorithm was proposed by unifying the search on data augmentation and neural architecture. Experimental results indicated that the proposed methods outperformed state-of-the-art approaches on MedMNIST, a pioneering benchmark designed for AutoML in medical image analysis.
翻译:自动化数据增强旨在自动工程增强政策,最近引起了越来越多的研究兴趣。许多先前的自动增强方法使用密度匹配战略,从测试-时间增强性能的角度对政策进行评估。在本文中,我们从理论上和经验上证明小规模医疗图像数据集的火车和验证组之间不一致,称为内部抽样偏差。接着,我们证明,内部抽样偏差可能导致密度匹配效率低下。为了解决这个问题,通过对先前的培训分配进行随机抽样政策,提出了称为“增强密度匹配”的增强搜索战略。此外,通过统一数据增强性和神经结构的搜索,提出了高效自动机学习算法。实验结果表明,拟议的方法超过了MedMedMIT的最新方法。MedMIT是医学图像分析中为自动ML设计的首创基准。