Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides x15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models. Code is available at https://github.com/mlvlab/TokenMixup.
翻译:混合是一种常见的图像分类数据增强技术。 最近混合方法的进展主要侧重于基于显著性的混合。 但是,许多显要的检测器需要大量计算,并且对参数重变压器模型特别繁琐。 为此,我们提议采用TokenMixup, 这是一种高效的注意引导的象征性数据增强方法,目的是尽量扩大混合符号的显著性能。 TokenMixup 提供x15 更快的突出性能增强数据与梯度方法相比。 此外,我们引入了一个 TokenMixup 变量, 将符号混合在一个实例中, 从而促成多尺度的特性增强。 实验显示,我们的方法大大改善了CIFAR 和图像网络-1K 的基准模型的性能, 并且比以前的方法效率更高。 我们还在来自斯克拉奇变压模型的 CIFAR-100 上达到了最先进的性能。 代码可在 https://github. com/ mlvlab/ TokenMixup上查阅 。