We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust.
翻译:我们提出了一个能够进行内插的深神经网络培训的新颖方法,即将实验性损失降至零。在每次迭代中,我们的方法构建了学习目标的随机近似值。近似值被称为捆绑,是线性函数的最大点。我们的捆绑包含一个固定的函数,可以降低经验性损失的界限。这使我们能够计算自动适应性学习率,从而提供准确的解决方案。此外,我们的捆绑包括了在DNN参数的当前环流和其他线性估计中计算出的线性近似值。使用这些额外近似值,使我们的方法对它的超参数更加强大。基于其理想的经验性能,我们用“Bungdle优化”法来形容结晶和精准性培训(BORAT)。为了操作BORAT,我们设计了一个新的算法,以便在每次试运行时都能够有效地优化捆绑性近值。我们用BORAT的理论结合了目前对DNNN参数和非CON的理论性估计。使用这些额外的近似值使我们的方法变得比得更强得多。我们用标准公开的数据集,我们用BORAT最强的实验法来演示其他一比AAT的实验方法。