This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such as pretraining and knowledge distillation) from the perspective of general cyclical training and recommend some changes to the typical training methodology. In summary, this paper defines the general cyclical training concept and discusses several specific ways in which this concept can be applied to training neural networks. In the spirit of reproducibility, the code used in our experiments is available at \url{https://github.com/lnsmith54/CFL}.
翻译:本文描述了机器学习中的“常规周期培训”原则,培训以“舒适培训”和“硬性培训”开始和结束于中世纪。我们为培训神经网络提出了几种表现形式,包括算法实例(通过超参数和损失功能)、基于数据的实例和基于模型的实例。具体地说,我们引入了几种新颖技术:周期体重衰减、周期批量规模、周期性焦点损失、周期性软性温度、周期性数据扩增、周期性梯度剪裁和周期性半监督学习。此外,我们证明周期性重量衰减、周期性软性负温度和周期性梯度剪裁(作为这一原则的三个例子)有益于经过培训的模式的测试准确性表现。此外,我们从一般周期培训的角度讨论基于模型的实例(如培训前和知识蒸馏等),并建议对典型培训方法作一些修改。总而言,本文界定了周期性培训概念,并讨论了可用于培训神经网络的若干具体方法。从可追溯性精神出发,我们实验中使用的代码{httpsurmurmission/rmissymission{