Recent advances in deep unsupervised learning have renewed interest in semi-supervised methods, which can learn from both labeled and unlabeled data. Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. We show that consistency regularization leads to flatter but narrower optima. We also show that the test error surface for these methods is approximately convex in regions of weight space traversed by SGD. Inspired by these observations, we propose to train consistency based semi-supervised models with stochastic weight averaging (SWA), a recent method which averages weights along the trajectory of SGD. We also develop fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With fast-SWA we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100 over many different numbers of observed training labels. For example, we achieve 95.0% accuracy on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 93.7%. We also improve the best known accuracy for domain adaptation from CIFAR-10 to STL from 80% to 83%. Finally, we show that with fast-SWA the simple $\Pi$ model becomes state-of-the-art for large labeled settings.
翻译:在未经监督的深层次学习方面最近的进展使人们对半监督方法重新产生兴趣,这些方法可以从标签和未标签数据中学习。目前半监督学习的最成功方法基于一致性规范,通过这种模式,对模型进行训练,使其稳健,使其投入和参数受到小扰动。我们显示,一致性规范导致偏斜,但范围缩小。我们还显示,这些方法的测试错误表面在由SGD所穿透的重空间区域中大致接近于已知的半监督方法。受这些观察的启发,我们提议对基于半监督模型的一致性进行以平均超重(SWA)为基础的半监督模型培训。这是在SGD轨迹上平均加权(SWA)的最新方法。我们还开发了快速SWA,通过在周期学习进度表的每个周期中平均多点来进一步加速趋同。随着快速SWA,我们在CFAR-10和CFAR-100模型的许多不同数量的观察培训标签中取得了最著名的半监督结果。例如,我们实现了95.0%的CFAR-10的精确度,只有4000个标签,比SGDGDG的轨道上的平均重量值。我们最后将改进了80-10的文献,从已知的精确度提高到83。我们成为了80-ST-10的精确度。我们所了解的精确度。