Mini-batch sub-sampling (MBSS) is favored in deep neural network training to reduce the computational cost. Still, it introduces an inherent sampling error, making the selection of appropriate learning rates challenging. The sampling errors can manifest either as a bias or variances in a line search. Dynamic MBSS re-samples a mini-batch at every function evaluation. Hence, dynamic MBSS results in point-wise discontinuous loss functions with smaller bias but larger variance than static sampled loss functions. However, dynamic MBSS has the advantage of having larger data throughput during training but requires the complexity regarding discontinuities to be resolved. This study extends the gradient-only surrogate (GOS), a line search method using quadratic approximation models built with only directional derivative information, for dynamic MBSS loss functions. We propose a gradient-only approximation line search (GOALS) with strong convergence characteristics with defined optimality criterion. We investigate GOALS's performance by applying it on various optimizers that include SGD, RMSprop and Adam on ResNet-18 and EfficientNetB0. We also compare GOALS's against the other existing learning rate methods. We quantify both the best performing and most robust algorithms. For the latter, we introduce a relative robust criterion that allows us to quantify the difference between an algorithm and the best performing algorithm for a given problem. The results show that training a model with the recommended learning rate for a class of search directions helps to reduce the model errors in multimodal cases.
翻译:MBSS (MBSS) 在深神经网络培训中偏向于微型批次抽样(MBSS), 以降低计算成本。 但是, 它仍然引入了内在的抽样错误, 使得选择适当的学习率具有挑战性。 抽样错误可以表现为线搜索中的偏差或差异。 动态 MBSS 在每次功能评估中都以微型批次为样本。 因此, 动态的MBSS 产生点向性不连续损失功能, 其偏差较小, 但差异大于静态抽样损失功能。 但是, 动态的MBSS 具有在培训期间数据通过量更大的优势, 但也要求解决不连续问题的复杂性。 此研究扩展了只使用梯度的代金(GOS), 这是一种线搜索方法, 使用仅以方向衍生物信息构建的四边近似模型, 用于动态MBSS 损失功能。 我们提出一个只使用梯度的近似近似近线搜索(GOALS),, 其特征是使用各种优化的模型, 包括 SGD、 RMSpremet18 和高效NetBOB0 。 我们还比较比较了梯化了梯值,,, 和 使用一种最强的算法,, 将我们使用另一种学习率 和最强的算算法,, 将一个最强的算法 以 进行最强的算法,, 最强的算法, 进行最精确化的方法, 和最精确化的算法,,, 和最精确化的算化的算法 。