Despite perfectly interpolating the training data, deep neural networks (DNNs) can often generalize fairly well, in part due to the "implicit regularization" induced by the learning algorithm. Nonetheless, various forms of regularization, such as "explicit regularization" (via weight decay), are often used to avoid overfitting, especially when the data is corrupted. There are several challenges with explicit regularization, most notably unclear convergence properties. Inspired by convergence properties of stochastic mirror descent (SMD) algorithms, we propose a new method for training DNNs with regularization, called regularizer mirror descent (RMD). In highly overparameterized DNNs, SMD simultaneously interpolates the training data and minimizes a certain potential function of the weights. RMD starts with a standard cost which is the sum of the training loss and a convex regularizer of the weights. Reinterpreting this cost as the potential of an "augmented" overparameterized network and applying SMD yields RMD. As a result, RMD inherits the properties of SMD and provably converges to a point "close" to the minimizer of this cost. RMD is computationally comparable to stochastic gradient descent (SGD) and weight decay, and is parallelizable in the same manner. Our experimental results on training sets with various levels of corruption suggest that the generalization performance of RMD is remarkably robust and significantly better than both SGD and weight decay, which implicitly and explicitly regularize the $\ell_2$ norm of the weights. RMD can also be used to regularize the weights to a desired weight vector, which is particularly relevant for continual learning.
翻译:尽管对培训数据进行了完全的内插,但深神经网络(DNNs)往往可以相当全面地推广,部分原因是学习算法引起的“不完全的正规化 ” 。尽管如此,各种形式的正规化,例如“明白的正规化”(通过体重衰减),往往被用来避免过度调整,特别是当数据被腐蚀时。在明确的正规化方面存在若干挑战,最明显的是不明确的趋同性格特性。由于“放大的”镜底(SMD)算法的趋同性能,我们提出了一种新的方法来培训DNS的正规化,称为正规镜底(RMD)。在高分分化的DNNs中,SMD同时对培训数据进行调和尽量减少重量的某种潜在功能。RMD开始的标准成本是培训损失的总和,将RMD2的精度和精确的递增性能都与RDRMD的精度相趋同性能相比。