We investigate the role of the optimizer in determining the quality of the model fit for neural networks with a small to medium number of parameters. We study the performance of Adam, an algorithm for first-order gradient-based optimization that uses adaptive momentum, the Levenberg and Marquardt (LM) algorithm a second order method, Broyden,Fletcher,Goldfarb and Shanno algorithm (BFGS) a second order method and LBFGS, a low memory version of BFGS. Using these optimizers we fit the function y = sinc(10x) using a neural network with a few parameters. This function has a variable amplitude and a constant frequency. We observe that the higher amplitude components of the function are fitted first and the Adam, BFGS and LBFGS struggle to fit the lower amplitude components of the function. We also solve the Burgers equation using a physics informed neural network(PINN) with the BFGS and LM optimizers. For our example problems with a small to medium number of weights, we find that the LM algorithm is able to rapidly converge to machine precision offering significant benefits over other optimizers. We further investigated the Adam optimizer with a range of models and found that Adam optimiser requires much deeper models with large numbers of hidden units containing up to 26x more parameters, in order to achieve a model fit close that achieved by the LM optimizer. The LM optimizer results illustrate that it may be possible build models with far fewer parameters. We have implemented all our methods in Keras and TensorFlow 2.
翻译:我们调查优化者在确定适合神经网络的模型质量方面的作用,该模型的参数数目小到中等。我们研究亚当的性能,这是使用适应性动力的一级梯度优化算法,Levenberg和Marqurdt(LM)算法是第二级方法,Broyden、Fletcher、Goldfarb和Shanno算法(BFGS)是第二级方法,LBBFGS的低存储版。利用这些优化者,我们使用具有少数参数的神经网络来适应函数 y= sinc(10x) 。这一功能具有可变的振幅参数和恒定频率。我们观察到,该功能的更高振幅值组件是先安装的,而Adam、BFGS和LGBFGS(L)是第二级的,我们用物理信息化神经网络(PINN)和LM优化器的低存储器。我们发现,所有比重小到中等重量的示例,我们发现LM算法的模型能够与更深的隐藏的精确度模型快速地趋近地接近于最精确的机器的模型。