Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging in general. DNN training is commonly phrased as a stochastic optimization problem whose challenges include non-convexity, non-smoothness, insufficient regularization, and complicated data distributions. Hence, the performance of DNNs on a given task depends crucially on tuning hyperparameters, especially learning rates and regularization parameters. In the absence of theoretical guidelines or prior experience on similar tasks, this requires solving many training problems, which can be time-consuming and demanding on computational resources. This can limit the applicability of DNNs to problems with non-standard, complex, and scarce datasets, e.g., those arising in many scientific applications. To remedy the challenges of DNN training, we propose slimTrain, a stochastic optimization method for training DNNs with reduced sensitivity to the choice hyperparameters and fast initial convergence. The central idea of slimTrain is to exploit the separability inherent in many DNN architectures; that is, we separate the DNN into a nonlinear feature extractor followed by a linear model. This separability allows us to leverage recent advances made for solving large-scale, linear, ill-posed inverse problems. Crucially, for the linear weights, slimTrain does not require a learning rate and automatically adapts the regularization parameter. Since our method operates on mini-batches, its computational overhead per iteration is modest. In our numerical experiments, slimTrain outperforms existing DNN training methods with the recommended hyperparameter settings and reduces the sensitivity of DNN training to the remaining hyperparameters.
翻译:深心神经网络(DNNS) 成功表现为在许多应用程序中具有高维功能; 然而, 培训 DNNS 通常会是一个随机优化问题, 其挑战包括非隐蔽性、非透析性、 规范化不足和数据分布复杂。 因此, DNNS 在特定任务上的性能, 关键地取决于超参数的调试, 特别是学习率和规范参数。 在缺乏理论性指示或类似任务方面的前身经验的情况下, 这需要解决许多培训问题, 这些问题可能耗时, 并且对计算资源要求很高。 DNNNS 培训通常被形容为: 非标准、 复杂和稀缺的数据集问题。 因此, 为弥补 DNNN培训的挑战, 我们建议使用一个精密的智能优化方法, 对选择的灵敏度和快速初始趋同, 这需要解决许多 DNNURS 的精度 递解性内含的内径比值, 使得我们目前的内径直的内径的内径直路路路结构 能够单独地解决我们目前的内变。