We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between $5\times$ and $16\times$ less data to reach the same accuracy. For example, on CIFAR-10 with 250 labeled examples we reach $93.73\%$ accuracy (compared to MixMatch's accuracy of $93.58\%$ with $4{,}000$ examples) and a median accuracy of $84.92\%$ with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch.
翻译:我们通过引入两种新技术来改进最近提议的“ MixMatch” 半监督的学习算法。 我们通过引入两种新的技术来改进“ MixMatch” 半监督的学习算法: 分布匹配和增强锁定。 分布匹配鼓励对未贴标签的数据的预测的边际分布接近地面真相标签的边际分布。 增强锁定为该模型输入的多种强化版本, 并鼓励每项产出接近对微弱的同一输入的预测。 为了产生强大的增强, 我们提议了一个自动增强变种, 在模型培训期间学习增强政策 。 我们的新算法, 被称为 ReixMatch, 比先前的工作明显提高了数据效率, 需要5美元至16美元的边际分布, 以达到同样准确性。 例如, 在CIFAR- 10 10 和 250个贴标签的例子中, 我们达到了93. 73 $ $ $( 与 MixMatch 的准确度为93. 58 $ $, 4{,} 000/ $ rerereglegment 示例) 和中位 84. 92$, $. $, 每类只有4个标签。 我们做了代码和开放数据库。