Early stopping is a simple and widely used method to prevent over-training neural networks. We develop theoretical results to reveal the relationship between the optimal early stopping time and model dimension as well as sample size of the dataset for certain linear models. Our results demonstrate two very different behaviors when the model dimension exceeds the number of features versus the opposite scenario. While most previous works on linear models focus on the latter setting, we observe that the dimension of the model often exceeds the number of features arising from data in common deep learning tasks and propose a model to study this setting. We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.
翻译:早期停止是防止过度训练神经网络的简单和广泛使用的方法。我们开发理论结果,以揭示最佳早期停止时间和模型层面之间的关系以及某些线性模型数据集的样本大小。我们的结果显示,当模型层面超过特征数量而相反的情景时,两种非常不同的行为。虽然以前关于线性模型的大部分工作侧重于后一种环境,但我们认为,模型的层面往往超过共同深层学习任务中数据产生的特征数量,并提出研究这一环境的模式。我们实验性地表明,我们关于最佳早期停止时间的理论结果与深层神经网络的培训过程相吻合。