Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to perform a survey of the performance optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only and avoids the huge cost of explicitly setting up the Hessian for large-scale networks. We compare the proposed second-order method with two state-of-the-art optimizers on five representative neural network problems, including regression and very deep networks from computer vision or variational autoencoders. For the largest setup, we efficiently parallelized the optimizers with Horovod and applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.
翻译:在许多计算中心,深神经网络的培训消耗了越来越多的计算资源份额。通常,采用粗力方法来获取超参数值。我们的目标是(1) 使大型神经网络能够使用较少超参数的二级优化方法,使大型神经网络能够使用较少的超参数进行二等优化,(2) 调查性能优化者的具体任务,向用户推荐问题的最佳用户。我们引入了新型的二级优化方法,要求赫西人只对矢量器产生影响,并避免为大型网络明确建立赫西安人的庞大成本。我们比较了拟议的二等方法,将五种有代表性神经网络问题的两台最先进的优化器与两台最先进的优化器作了比较,包括回归和从计算机视觉或变异自动电解码器中获得的非常深的网络。对于最大的设置,我们高效地将优化器与霍罗沃德相平行,并将其应用到8 GPU NVIDIA P100 (DGX-1) 机器上。