The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have disclosed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.
翻译:对单精度浮点算术的承诺在深层学习界很普遍。 为了评估这一承诺是否合理。 为了评估这一承诺是否合理,已经调查了计算精度(单精度和双精度)对共振梯度方法(二级优化算法)和RMSprop(一级算法)的最佳性能的影响。 对一至五个完全连接的隐藏层和中度或强度非线性、最多400万个网络参数的神经网络进行测试,为中度平方错误优化了网络参数。 培训任务已经确定,以便知道其最低 MSE 值为零。 计算机实验显示,单精度精度(与超级线性趋同)可以保持双精度,只要线搜索能找到改进之处。 RMSproc 等第一级方法不会从双精度中获益。 但是,对于中度非线性任务,CG显然优。对于明显非线性任务,两种算法等级在与产出差异有关的中度差点方面都发现解决办法相当差。 具有双重浮动精确性的CG在潜在目标上都比较优。