Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With weight averaging, we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of labeled training data. For example, we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 6.3%.
翻译:目前,半监督学习的最成功方法以一致性规范为基础,根据这些观察,我们建议与Stochatic Weight Averaging(SWA)一起培训基于一致性的方法,该方法是沿SGD轨迹和经修改的学习进度表平均加权的最近方法。我们还提议采用快速SWA,该方法通过在周期学习进度表的每个周期中平均多点来进一步加速趋同。我们平均的重量,我们取得了已知的关于CFAR-10和CIFAR-100的半监督最佳结果,超过了许多不同数量的标记培训数据。例如,我们实现了5.0%的CIFAR-10误差,只有4,000个标签,与先前的6.3%的文献最佳结果相比。