A major challenge in current optimization research for deep learning is to automatically find optimal step sizes for each update step. The optimal step size is closely related to the shape of the loss in the update step direction. However, this shape has not yet been examined in detail. This work shows empirically that the batch loss over lines in negative gradient direction is mostly convex locally and well suited for one-dimensional parabolic approximations. By exploiting this parabolic property we introduce a simple and robust line search approach, which performs loss-shape dependent update steps. Our approach combines well-known methods such as parabolic approximation, line search and conjugate gradient, to perform efficiently. It surpasses other step size estimating methods and competes with common optimization methods on a large variety of experiments without the need of hand-designed step size schedules. Thus, it is of interest for objectives where step-size schedules are unknown or do not perform well. Our extensive evaluation includes multiple comprehensive hyperparameter grid searches on several datasets and architectures. Finally, we provide a general investigation of exact line searches in the context of batch losses and exact losses, including their relation to our line search approach.
翻译:当前的深层学习优化研究的一大挑战是自动为每个更新步骤找到最佳步骤大小。 最佳步骤大小与更新步骤方向损失的形状密切相关。 但是, 这一形状尚未详细研究。 这项工作从经验上表明, 负梯度方向线上的批量损失大多是局部的, 并且非常适合单维抛物线近似。 通过利用这一抛物线属性, 我们引入了一个简单而稳健的行距搜索方法, 以进行损失形状更新步骤。 我们的方法结合了众所周知的方法, 如抛物线近似、 线搜索和共振梯度等, 以高效地运行。 它超过了其他步骤大小估算方法, 并在大量实验中与通用的优化方法竞争, 而不需要手工设计的步骤大小计划。 因此, 我们的广泛评价包括在若干数据集和结构上进行多重全面的超参数网格搜索。 最后, 我们提供对批量损失和准确损失的精确行距搜索, 包括它们与行距搜索方法的关系进行一般性调查。