Mixup refers to interpolation-based data augmentation, originally motivated as a way to go beyond empirical risk minimization (ERM). Yet, its extensions focus on the definition of interpolation and the space where it takes place, while the augmentation itself is less studied: For a mini-batch of size $m$, most methods interpolate between $m$ pairs with a single scalar interpolation factor $\lambda$. In this work, we make progress in this direction by introducing MultiMix, which interpolates an arbitrary number $n$ of tuples, each of length $m$, with one vector $\lambda$ per tuple. On sequence data, we further extend to dense interpolation and loss computation over all spatial positions. Overall, we increase the number of tuples per mini-batch by orders of magnitude at little additional cost. This is possible by interpolating at the very last layer before the classifier. Finally, to address inconsistencies due to linear target interpolation, we introduce a self-distillation approach to generate and interpolate synthetic targets. We empirically show that our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks. By analyzing the embedding space, we observe that the classes are more tightly clustered and uniformly spread over the embedding space, thereby explaining the improved behavior.
翻译:混合是指基于内推的数据增强,最初的驱动力是超越实验风险最小化(ERM)的一种方法。然而,它的扩展侧重于内推定义和它发生的空间,而扩大本身的研究较少:对于规模为百万美元的小批量而言,大多数方法是将美元对等对等的内插,同时有一个单一的尺度内插系数$\lambda美元。在这项工作中,我们通过引入多混合(MultiMix)而朝这个方向取得进展,它任意地将一个长度为1美元,每个长度为1美元,每个长度为1个矢量,每个矢量为1美元。关于序列数据,我们进一步扩展到对所有空间位置的密集内插和损计算。总体而言,我们增加每个微型对等量的双乘数,以很少增加成本。这可以通过在分类师之前的最后一层进行内插。最后,为了解决线性目标内插的不一致,我们引入了一种自我淡化方法来生成和间插合成目标。我们通过实验性的方法,在四个层次上对改进的空间定式的模型进行更精确地分析,从而将我们改进了我们的工作结果纳入了四个层次。