Real-world training data usually exhibits long-tailed distribution, where several majority classes have a significantly larger number of samples than the remaining minority classes. This imbalance degrades the performance of typical supervised learning algorithms designed for balanced training sets. In this paper, we address this issue by augmenting minority classes with a recently proposed implicit semantic data augmentation (ISDA) algorithm, which produces diversified augmented samples by translating deep features along many semantically meaningful directions. Importantly, given that ISDA estimates the class-conditional statistics to obtain semantic directions, we find it ineffective to do this on minority classes due to the insufficient training data. To this end, we propose a novel approach to learn transformed semantic directions with meta-learning automatically. In specific, the augmentation strategy during training is dynamically optimized, aiming to minimize the loss on a small balanced validation set, which is approximated via a meta update step. Extensive empirical results on CIFAR-LT-10/100, ImageNet-LT, and iNaturalist 2017/2018 validate the effectiveness of our method.
翻译:现实世界培训数据通常具有长期的分布,其中几个多数阶层的样本数量远远多于其余少数民族阶层。这种不平衡削弱了为均衡培训而设计的典型受监督的学习算法的性能。在本文中,我们通过增加少数民族阶层,最近建议采用隐含的语义数据扩增算法来解决这一问题,该算法通过翻译许多具有语义意义方向的深层特征来产生多样化的扩大样本。重要的是,鉴于国际民主行动党估算了等级条件统计以获得语义方向,我们发现由于培训数据不足,在少数民族阶层中这样做是无效的。为此,我们建议采用新的方法,以自动学习元学习,学习转变语义方向。具体地说,培训期间的增强战略是动态优化的,目的是通过元化更新步骤将小规模平衡验证组的损失减少到最低。关于CFAR-LT-10-100、图像网-LT和iNatulist 2017/2018的广泛经验结果证实了我们的方法的有效性。